Document

Introduction to Erasure coding
Kenji Kaneda
発表の動機と目的
• 耐故障ファイルシステム関係の論文中に
Erasure codingという用語がよく現れる
例)Oceanstore, RAID, …
• 詳細については余りよく知らない
– アルゴリズムの効率は?
– 実装にかかる手間はどれくらい?
Erasure codingの一種である
Reed-Solomon Codingについて調べる
Outline
•
•
•
•
Problem Specification
General Strategy
Overview of Reed-Solomon Coding
An Example
• Appendix: Galois Fields
Outline
•
•
•
•
Problem Specification
General Strategy
Overview of Reed-Solomon Coding
An Example
• Appendix: Galois Fields
Problem Specification (1/2)
• Given
– n Data devices (D1, D2, …, Dn)
• Each holds k bytes
– m Checksum devices (C1, C2, …, Cm)
• Each holds k bytes
D1
D2
m=2 C1
C2
n=8
D3
D4
D5
D6
D7
D8
Problem Specification (2/2)
• Goal
– Define the calculation of each Ci such that
if any m of D1, D2, …, Dn, C1, C2, …, Cm fail,
then the failed devices can be reconstructed
from the non-failed devices
D1
D2
m=2 C1
C2
n=8
D3
D4
D5
D6
D7
D8
An Example Configuration
• “n+1-parity” coding (RAID Level 5)
– m=1
– c1,j = d1,j ⊕ d2,j ⊕ … ⊕ dn,j
where c1,j = j-th byte of C1
and
di,j = j-th byte of Di
D1
C1
D2
…
Dn
Outline
•
•
•
•
Problem Specification
General Strategy
Overview of Reed-Solomon Coding
An Example
• Appendix: Galois Fields
General Strategy (1/4)
• Partition storage devices
…
D1
D2
Dn
…
C1
C2
Cm
General Strategy (2/4)
• Initialize checksum devices
…
D1
D2
Dn
…
C1
C2
Cm
General Strategy (3/4)
• Update data and checksum devices
update
…
D1
D2
update
Dn
update
update
…
C1
C2
Cm
General Strategy (4/4)
• Recover storage devices from failures
…
D1
D2
Dn
…
C1
C2
Cm
Partitioning of Devices (1/2)
• Break up each device into words
– Size of each word is w bits
• w is chosen by a programmer
w bits
k bytes
Di
Partitioning of Devices (2/2)
• Henceforth we assume that each device
holds just 1 word (for simplicity)
– data words: d1, d2, …, dn
– checksum words: c1, c2, …, cm
d1
d2
D1
D2
c1
c2
C1
C2
…
dn
Dn
…
cm
Cm
Calculation of Checksum
• Define a coding function Fi (d1, d2, …, dn)
– Calculates a checksum word on Ci
E.g.) F1 (d1, d2, …, dn) = d1 ⊕ d2 ⊕ … ⊕ dn
d1
d2
D1
D2
c1=F1(d1, …, dn)
c2=F2(d1, …, dn)
C1
C2
…
dn
Dn
…
cm=Fm(d1, …, dn)
Cm
Update of Checksum
• Define an update function Gi,j (dj, dj’, ci)
– Calculates a checksum word on Ci
when a checksum word on Ci is ci and
a data word on Dj is updated from dj to dj’
E.g.) G1,j (dj, dj’, ci) = c1 ⊕ dj ⊕ dj’
d1
d2d ’
2
D1
D2
…
c1’=G1,2c1(d2,d2’,c1) c2’=G2,2c2(d2,d2’,c2) …
C1
C2
dn
Dn
cm(d ,d ’,c )
c3’=G3,2
2 2
3
Cm
Recovery from Failure
1. Restore the words in any failed data
device Dj from the words in the nonfailed devices
E.g.) dj = d1 ⊕ … ⊕ dj-1 ⊕ dj+1 ⊕ … ⊕ dn ⊕ c1
2. Re-compute any failed checksum
devices Ci with Fi
Problem Restatement
• Given n data words d1, d2, …, dn, all of
size w
• Define functions F and G to calculate
and maintain the checksum words
c1, c2, …, cm
Outline
•
•
•
•
Problem Specification
General Strategy
Overview of Reed-Solomon Coding
An Example
• Appendix: Galois Fields
Overview of
Reed-Solomon Coding
• Using the Vandermonde matrix to
calculate and maintain checksum words
• Using Gaussian Elimination to recover
from failures
• Using Galois Fields to perform
arithmetic
Calculating and Maintaining
Checksum Words
• Define a coding function Fi and
an update function Gi,j
Definition of Coding Function
(1/2)
• Define Fi to be a linear combination of
the data words
n
ci = Fi (d1, d2, …, dn) = Σ dj fi,j
j=1
– Vector representation
c1
c2
C=
:
cm
…
…
f1,n
f2,n
:
:
fm,1 fm,2 …
:
fm,n
f1,1
f2,1
=
f1,2
f2,2
d1
d2
:
:
dn
= FD
Definition of Coding Function
(2/2)
• Define F to be the m×n Vandermonde
matrix
fi, j = j i-1
f1,1
f2,1
F=
f1,2
f2,2
…
…
:
:
fm,1 fm,2 …
f1,n
f2,n
=
:
fm,n
1
1
:
1
1
…
2
…
:
2m-1 …
1
n
:
nm-1
Definition of Update Function
Gi,j (dj, dj’, ci) = ci + fi,j (dj’ – dj)
– Subtract out the portion of the checksum
word that corresponds to dj
– Add the required amount for dj’
Recovering from Failures (1/4)
• Define matrix A and E
I
A=
F
D
E=
C
I : n×n identity matrix
AD = E
Recovering from Failures (2/4)
• When devices fail,
– Delete the corresponding rows from A and E
AD =
1
0
:
0
1
1
:
1
0
1
:
0
1
2
:
2m-1
…
…
…
…
…
0
0
:
1
1
n
:
nm-1
d1
d2
:
dn
=
d1
d2
:
dn
c1
c2
:
cm
=E
Recovering from Failures (3/4)
• When devices fail,
– Delete the corresponding rows from A and E
A’D =
…
0
:
0
1
1
:
0
1
:
1
:
2m-1 …
…
…
0
:
1
1
:
nm-1
d1
d2
:
dn
=
d2
:
dn
c1
:
cm
= E’
Recovering from Failures (4/4)
• Values of D are recovered from A’D = E’
using Gaussian Elimination
E.g.) if m devices fail, D = (A’) -1E’
• A’ is a non-singular because F is Vandermonde
matrix
Problem with
Arithmetic Operations (1/2)
• Domain and range of the computation
are binary words of a fixed length w
– Not infinite precision real numbers
Problem with
Arithmetic Operations (2/2)
• The algebra is correct when all the
elements are infinite precision real
numbers
 We must make sure that it is correct
for the fixed-size words
Naïve Solution and its Problem
• Arithmetic over the integers modulo 2w
Division is not defined for all pairs of
elements
E.g.) (3÷2) is undefined modulo 22 (=4)
Our Solution
• Perform addition/multiplication
over a Galois Field
Mapping Between Elements of
GF(2w)and Binary Words
• r(x) ∈ GF(2w)
⇔ a binary word b of size w such that
i-th bit of b = the coefficient of xi in r(x)
r(x) = awxw + aw-1xw-1 + … + a1x + a0
b
= awaw-1 … a1a0
Examples of Mapping (1/3)
• GF(22) = GF(2)[x]/x2+x+1
Generated Polynomial
element
element
0
0
Binary
element
00
Decimal
element
0
x1
1
01
1
x2
x
10
2
x3
x+1
11
3
Examples of Mapping (2/3)
• GF(24) = GF(2)[x]/x4+x+1
Generated
element
0
Polynomial
element
0
Binary
element
0000
Decimal
element
0
x0
1
0001
1
x1
x
0010
2
x2
x2
0100
4
x3
x3
1000
8
x4
x+1
0011
3
x5
x2+x
0110
6
x6
x3+x2
1100
12
Example of Mapping (3/3)
Generated
element
Polynomial
element
Binary
element
Decimal
element
x7
x3+x+1
1011
11
x8
x2+1
0101
5
x9
x3+x
1010
10
x10
x2+x+1
0111
7
x11
x3+x2+x
1110
14
x12
x3+x2+x+1
1111
15
x13
x3+x2+1
1101
13
x14
x3+1
1001
9
x15
1
0001
1
Addition/Subtraction
over Binary Elements
• XOR operation
Binary elements
11 + 7 = 1011⊕ 0111 = 1100 = 12
GF(2w)
11 + 7 = (x3+x+1) + (x2+x+1) = x3+x2 = 12
Multiplication/Division
over Binary Elements (1/4)
1. Covert the binary words to their
polynomial elements
2. Multiply/divide the polynomials
modulo a primitive polynomial q(x)
3. Covert the result back to a binary
element
Binary elements
GF(2w)
b1 * b2 =
b3
r1(x) * r2(x) = r3(x)
Multiplication/Division
over Binary Elements (2/4)
Use two logarithm tables
• gflog
– Maps a binary element b to power j
such that xj is equivalent to b
• gfilog
– Maps from a power j to its binary element b
i
gflog[ i ]
gfilog[ i ]
0
1
1
0
2
2
1
4
3
4
8
4
2
3
5
8
6
6 7 8
5 10 3
12 11 5
…
GF(24)
Multiplication/Division
over Binary Elements (3/4)
1. Convert each binary element to its
discrete logarithm
– By looking up gflog
2. Add/Subtract the logarithms modulo 2w-1
※ x2^w-1 = q(x)
3. Covert result back to a binary element
– By looking up gfilog
Multiplication/Division
over Binary Elements (4/4)
Binary elements
3 * 7 = gfilog[gflog[3]+gflog[7]]
= gfilog[4+10]
=9
GF(2w)
3 * 7 = (x+1) * (x2+x+1) = x4+10 = x3+1 = 9
Summary of Algorithm
1.
2.
3.
4.
5.
Choose w such that 2w > n + m
Set up the tables gflog and gfilog
Set up the matrix F
Calculate words of the checksum
devices
If any number of devices up to m fails,
i. Choose any n of the remaining devices
ii. Construct the matrix A’ and E’
iii. Solve for D in A’D = E’
Outline
•
•
•
•
Problem Specification
General Strategy
Overview of Reed-Solomon Coding
An Example
• Appendix: Galois Fields
An Example
• Suppose n=3 and m=4
Step 1~3
• Choose w to be 4
※ 2w > n + m である必要がある
• Set up gflog and gfilog
• Set up the 3×3 matrix F
– Defined over GF(24)
10
F = 11
12
20
21
22
30
31 =
32
1
1
1
1
2
4
1
3
5
Step 4
• Calculate each word of the checksum
devices using FD=C
– d1=3, d2=13, d3=9
c1 = (1)(d1) ⊕ (1)(d2) ⊕ (1)(d3) = 7
c2 = (1)(d1) ⊕ (2)(d2) ⊕ (3)(d3) = 2
c3 = (1)(d1) ⊕ (4)(d2) ⊕ (5)(d3) = 9
Step 5
• Change d2 to 1
– D2 send the value (1-13) = (0001⊕1101) =
12
c1 = 7 ⊕ (1)(12) =
11
c2 = 2 ⊕ (2)(12) = 9
c3 = 9 ⊕ (4)(12) =
12
Step 6
• D2, D3, and C3 are lost
AD =
1
0
0
1
1
1
0
1
0
1
2
4
0
0
1
1
3
5
3
1
D= 9
11
9
12
=E
Step 7
• D2, D3, and C3 are lost
A’D =
1
1
1
0
1
2
0
1
3
3
D = 11 = E’
9
Step 7
• Recovery
D = (A’)-1E’ =
1
2
3
0
3
2
0
1
1
3
11 =
9
c3 = (1)(3) ⊕ (4)(1) ⊕ (5)(9) =
12
3
1
9
Summary
• Reed-Solomon Coding
– Vandermonde matrix for checksum calculation
– Gaussian Elimination for failure recovery
– Arithmetic over Galois Fields
※大規模P2Pシステムに本当に適応可能?
Reference
• A tutorial on Reed-Solomon Coding for
Fault-tolerance in RAID-like Systems
– James S. Plank
– Software – Practice and Experience,
Vol. 27(9), 996-1012 (1997)
Outline
•
•
•
•
Problem Specification
General Strategy
Overview of Reed-Solomon Coding
An Example
• Appendix: Galois Fields
Galois Fields
• A field GF(n) is a set of n elements
closed under addition and multiplication
– Every element has an additive and
multiplicative inverse
• Except for the 0 element which has no
multiplicative inverse
Examples of Galois Fields
(1/2)
• GF(2) = { 0, 1 }
– Addition/multiplication are performed on
modulo 2
• GF(n) = { 0, 1, …, n-1 }
– where n is a prime number
– Addition/multiplication are performed on
modulo n
※ { 0,1,2,3 } is not a Galois field
– 2 has no multiplicative inverse
Examples of Galois Fields
(2/2)
• GF(2w) = GF(2)[x]/q(x)
– Elements are polynomials whose
coefficients belong to GF(2)
– Arithmetic module a primitive function q(x)
• Degree of q(x) = w
• Coefficients of q(x) belong to GF(2)
E.g.) GF(22) = GF(2)[x]/x2+x+1
= { 0, 1, x, x+1 }
------------------------------------------------------------------------
Implementation
• RAID controller
D1
…
Dn
C1
… Cm
CPU
• Distributed checkpoint system
D1
CPU
…
Dn
C1
CPU CPU
network
… Cm
CPU
Failure Model
• Erasure model
– When a device fails, it shutdowns
– System recognizes this shutdown
C.f.) Error model
– Device failure is manifested by
storing/retrieving incorrect values