Introduction to Erasure coding Kenji Kaneda 発表の動機と目的 • 耐故障ファイルシステム関係の論文中に Erasure codingという用語がよく現れる 例)Oceanstore, RAID, … • 詳細については余りよく知らない – アルゴリズムの効率は? – 実装にかかる手間はどれくらい? Erasure codingの一種である Reed-Solomon Codingについて調べる Outline • • • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields Outline • • • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields Problem Specification (1/2) • Given – n Data devices (D1, D2, …, Dn) • Each holds k bytes – m Checksum devices (C1, C2, …, Cm) • Each holds k bytes D1 D2 m=2 C1 C2 n=8 D3 D4 D5 D6 D7 D8 Problem Specification (2/2) • Goal – Define the calculation of each Ci such that if any m of D1, D2, …, Dn, C1, C2, …, Cm fail, then the failed devices can be reconstructed from the non-failed devices D1 D2 m=2 C1 C2 n=8 D3 D4 D5 D6 D7 D8 An Example Configuration • “n+1-parity” coding (RAID Level 5) – m=1 – c1,j = d1,j ⊕ d2,j ⊕ … ⊕ dn,j where c1,j = j-th byte of C1 and di,j = j-th byte of Di D1 C1 D2 … Dn Outline • • • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields General Strategy (1/4) • Partition storage devices … D1 D2 Dn … C1 C2 Cm General Strategy (2/4) • Initialize checksum devices … D1 D2 Dn … C1 C2 Cm General Strategy (3/4) • Update data and checksum devices update … D1 D2 update Dn update update … C1 C2 Cm General Strategy (4/4) • Recover storage devices from failures … D1 D2 Dn … C1 C2 Cm Partitioning of Devices (1/2) • Break up each device into words – Size of each word is w bits • w is chosen by a programmer w bits k bytes Di Partitioning of Devices (2/2) • Henceforth we assume that each device holds just 1 word (for simplicity) – data words: d1, d2, …, dn – checksum words: c1, c2, …, cm d1 d2 D1 D2 c1 c2 C1 C2 … dn Dn … cm Cm Calculation of Checksum • Define a coding function Fi (d1, d2, …, dn) – Calculates a checksum word on Ci E.g.) F1 (d1, d2, …, dn) = d1 ⊕ d2 ⊕ … ⊕ dn d1 d2 D1 D2 c1=F1(d1, …, dn) c2=F2(d1, …, dn) C1 C2 … dn Dn … cm=Fm(d1, …, dn) Cm Update of Checksum • Define an update function Gi,j (dj, dj’, ci) – Calculates a checksum word on Ci when a checksum word on Ci is ci and a data word on Dj is updated from dj to dj’ E.g.) G1,j (dj, dj’, ci) = c1 ⊕ dj ⊕ dj’ d1 d2d ’ 2 D1 D2 … c1’=G1,2c1(d2,d2’,c1) c2’=G2,2c2(d2,d2’,c2) … C1 C2 dn Dn cm(d ,d ’,c ) c3’=G3,2 2 2 3 Cm Recovery from Failure 1. Restore the words in any failed data device Dj from the words in the nonfailed devices E.g.) dj = d1 ⊕ … ⊕ dj-1 ⊕ dj+1 ⊕ … ⊕ dn ⊕ c1 2. Re-compute any failed checksum devices Ci with Fi Problem Restatement • Given n data words d1, d2, …, dn, all of size w • Define functions F and G to calculate and maintain the checksum words c1, c2, …, cm Outline • • • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields Overview of Reed-Solomon Coding • Using the Vandermonde matrix to calculate and maintain checksum words • Using Gaussian Elimination to recover from failures • Using Galois Fields to perform arithmetic Calculating and Maintaining Checksum Words • Define a coding function Fi and an update function Gi,j Definition of Coding Function (1/2) • Define Fi to be a linear combination of the data words n ci = Fi (d1, d2, …, dn) = Σ dj fi,j j=1 – Vector representation c1 c2 C= : cm … … f1,n f2,n : : fm,1 fm,2 … : fm,n f1,1 f2,1 = f1,2 f2,2 d1 d2 : : dn = FD Definition of Coding Function (2/2) • Define F to be the m×n Vandermonde matrix fi, j = j i-1 f1,1 f2,1 F= f1,2 f2,2 … … : : fm,1 fm,2 … f1,n f2,n = : fm,n 1 1 : 1 1 … 2 … : 2m-1 … 1 n : nm-1 Definition of Update Function Gi,j (dj, dj’, ci) = ci + fi,j (dj’ – dj) – Subtract out the portion of the checksum word that corresponds to dj – Add the required amount for dj’ Recovering from Failures (1/4) • Define matrix A and E I A= F D E= C I : n×n identity matrix AD = E Recovering from Failures (2/4) • When devices fail, – Delete the corresponding rows from A and E AD = 1 0 : 0 1 1 : 1 0 1 : 0 1 2 : 2m-1 … … … … … 0 0 : 1 1 n : nm-1 d1 d2 : dn = d1 d2 : dn c1 c2 : cm =E Recovering from Failures (3/4) • When devices fail, – Delete the corresponding rows from A and E A’D = … 0 : 0 1 1 : 0 1 : 1 : 2m-1 … … … 0 : 1 1 : nm-1 d1 d2 : dn = d2 : dn c1 : cm = E’ Recovering from Failures (4/4) • Values of D are recovered from A’D = E’ using Gaussian Elimination E.g.) if m devices fail, D = (A’) -1E’ • A’ is a non-singular because F is Vandermonde matrix Problem with Arithmetic Operations (1/2) • Domain and range of the computation are binary words of a fixed length w – Not infinite precision real numbers Problem with Arithmetic Operations (2/2) • The algebra is correct when all the elements are infinite precision real numbers We must make sure that it is correct for the fixed-size words Naïve Solution and its Problem • Arithmetic over the integers modulo 2w Division is not defined for all pairs of elements E.g.) (3÷2) is undefined modulo 22 (=4) Our Solution • Perform addition/multiplication over a Galois Field Mapping Between Elements of GF(2w)and Binary Words • r(x) ∈ GF(2w) ⇔ a binary word b of size w such that i-th bit of b = the coefficient of xi in r(x) r(x) = awxw + aw-1xw-1 + … + a1x + a0 b = awaw-1 … a1a0 Examples of Mapping (1/3) • GF(22) = GF(2)[x]/x2+x+1 Generated Polynomial element element 0 0 Binary element 00 Decimal element 0 x1 1 01 1 x2 x 10 2 x3 x+1 11 3 Examples of Mapping (2/3) • GF(24) = GF(2)[x]/x4+x+1 Generated element 0 Polynomial element 0 Binary element 0000 Decimal element 0 x0 1 0001 1 x1 x 0010 2 x2 x2 0100 4 x3 x3 1000 8 x4 x+1 0011 3 x5 x2+x 0110 6 x6 x3+x2 1100 12 Example of Mapping (3/3) Generated element Polynomial element Binary element Decimal element x7 x3+x+1 1011 11 x8 x2+1 0101 5 x9 x3+x 1010 10 x10 x2+x+1 0111 7 x11 x3+x2+x 1110 14 x12 x3+x2+x+1 1111 15 x13 x3+x2+1 1101 13 x14 x3+1 1001 9 x15 1 0001 1 Addition/Subtraction over Binary Elements • XOR operation Binary elements 11 + 7 = 1011⊕ 0111 = 1100 = 12 GF(2w) 11 + 7 = (x3+x+1) + (x2+x+1) = x3+x2 = 12 Multiplication/Division over Binary Elements (1/4) 1. Covert the binary words to their polynomial elements 2. Multiply/divide the polynomials modulo a primitive polynomial q(x) 3. Covert the result back to a binary element Binary elements GF(2w) b1 * b2 = b3 r1(x) * r2(x) = r3(x) Multiplication/Division over Binary Elements (2/4) Use two logarithm tables • gflog – Maps a binary element b to power j such that xj is equivalent to b • gfilog – Maps from a power j to its binary element b i gflog[ i ] gfilog[ i ] 0 1 1 0 2 2 1 4 3 4 8 4 2 3 5 8 6 6 7 8 5 10 3 12 11 5 … GF(24) Multiplication/Division over Binary Elements (3/4) 1. Convert each binary element to its discrete logarithm – By looking up gflog 2. Add/Subtract the logarithms modulo 2w-1 ※ x2^w-1 = q(x) 3. Covert result back to a binary element – By looking up gfilog Multiplication/Division over Binary Elements (4/4) Binary elements 3 * 7 = gfilog[gflog[3]+gflog[7]] = gfilog[4+10] =9 GF(2w) 3 * 7 = (x+1) * (x2+x+1) = x4+10 = x3+1 = 9 Summary of Algorithm 1. 2. 3. 4. 5. Choose w such that 2w > n + m Set up the tables gflog and gfilog Set up the matrix F Calculate words of the checksum devices If any number of devices up to m fails, i. Choose any n of the remaining devices ii. Construct the matrix A’ and E’ iii. Solve for D in A’D = E’ Outline • • • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields An Example • Suppose n=3 and m=4 Step 1~3 • Choose w to be 4 ※ 2w > n + m である必要がある • Set up gflog and gfilog • Set up the 3×3 matrix F – Defined over GF(24) 10 F = 11 12 20 21 22 30 31 = 32 1 1 1 1 2 4 1 3 5 Step 4 • Calculate each word of the checksum devices using FD=C – d1=3, d2=13, d3=9 c1 = (1)(d1) ⊕ (1)(d2) ⊕ (1)(d3) = 7 c2 = (1)(d1) ⊕ (2)(d2) ⊕ (3)(d3) = 2 c3 = (1)(d1) ⊕ (4)(d2) ⊕ (5)(d3) = 9 Step 5 • Change d2 to 1 – D2 send the value (1-13) = (0001⊕1101) = 12 c1 = 7 ⊕ (1)(12) = 11 c2 = 2 ⊕ (2)(12) = 9 c3 = 9 ⊕ (4)(12) = 12 Step 6 • D2, D3, and C3 are lost AD = 1 0 0 1 1 1 0 1 0 1 2 4 0 0 1 1 3 5 3 1 D= 9 11 9 12 =E Step 7 • D2, D3, and C3 are lost A’D = 1 1 1 0 1 2 0 1 3 3 D = 11 = E’ 9 Step 7 • Recovery D = (A’)-1E’ = 1 2 3 0 3 2 0 1 1 3 11 = 9 c3 = (1)(3) ⊕ (4)(1) ⊕ (5)(9) = 12 3 1 9 Summary • Reed-Solomon Coding – Vandermonde matrix for checksum calculation – Gaussian Elimination for failure recovery – Arithmetic over Galois Fields ※大規模P2Pシステムに本当に適応可能? Reference • A tutorial on Reed-Solomon Coding for Fault-tolerance in RAID-like Systems – James S. Plank – Software – Practice and Experience, Vol. 27(9), 996-1012 (1997) Outline • • • • Problem Specification General Strategy Overview of Reed-Solomon Coding An Example • Appendix: Galois Fields Galois Fields • A field GF(n) is a set of n elements closed under addition and multiplication – Every element has an additive and multiplicative inverse • Except for the 0 element which has no multiplicative inverse Examples of Galois Fields (1/2) • GF(2) = { 0, 1 } – Addition/multiplication are performed on modulo 2 • GF(n) = { 0, 1, …, n-1 } – where n is a prime number – Addition/multiplication are performed on modulo n ※ { 0,1,2,3 } is not a Galois field – 2 has no multiplicative inverse Examples of Galois Fields (2/2) • GF(2w) = GF(2)[x]/q(x) – Elements are polynomials whose coefficients belong to GF(2) – Arithmetic module a primitive function q(x) • Degree of q(x) = w • Coefficients of q(x) belong to GF(2) E.g.) GF(22) = GF(2)[x]/x2+x+1 = { 0, 1, x, x+1 } ------------------------------------------------------------------------ Implementation • RAID controller D1 … Dn C1 … Cm CPU • Distributed checkpoint system D1 CPU … Dn C1 CPU CPU network … Cm CPU Failure Model • Erasure model – When a device fails, it shutdowns – System recognizes this shutdown C.f.) Error model – Device failure is manifested by storing/retrieving incorrect values
© Copyright 2024 ExpyDoc