Information Theory

exercise in the previous class (1)
Consider an “odd” parity check code C whose codewords are
(x1, …, xk, p) with p = x1+…+xk+1. Is C a linear code?
No.
x1=1, x2=x3=...=xk=0 ⇒ p = 0, and 100...00 is a codeword
x2=1, x1=x3=...=xk=0 ⇒ p = 0, and 010...00 is a codeword
the sum of the two codewords = 110...0, not a codeword
1
exercise in the previous class (2)
Construct a 2D code for 6-bit information (a1, ..., a6) as follows.
determine the generator and parity check matrices
encode 011001 using the generator matrix
correct an error in the sequence 110111001010
a 1 a2 a3 p1
(a1, ..., a6)
a 4 a5 a6 p2
→ (a1, ..., a6, p1, p2, q1, q2, q3, r)
q 1 q2 q3 r
parity symbols:
p1 = x 1 + x 2 + x 3
p2 =
x4 + x5 + x6
q1 = x 1
+ x4
q2 =
x2
+ x5
q3 =
x3
+ x6
r = x1 + x2 + x3 + x4 + x5 + x6
2
exercise in the previous class (3)
G: 100000 101001
010000 100101
001000 100011
000100 011001
000010 010101
000001 010011
(0 1 1 0 0 1)G
= (0 1 1 0 0 1 0 1 0 1 0 1)
transpose
p1 = x1 + x2 + x3
p2 =
x4 + x5 + x6
q1 = x1
+ x4
q2 =
x2
+ x5
q3 =
x3
+ x6
r = x1 + x2 + x3 + x4 + x5 + x6
111000 coefficients
000111
100100 (係数)
010010
001001
111111
H: 111000 100000
000111 010000
as is 100100 001000
010010 000100
001001 000010
111111 000001
H(1 1 0 1 1 1 0 0 1 0 1 0)T
= (0 1 1 0 0 1)T = the 4-th column
110111001010  110011001010
3
in the previous class...
linear codes: definition, encoding, decoding
one-bit error at the i-th symbol position
⇔ syndrome equals the i-th vector of H
if several column vectors in H are the same,
then we cannot correct one-bit errors in a codeword.
if all column vectors in H are different,
then we can correct all one-bit errors in a codeword.
4
design of error correcting codes
Construct a parity check matrix with all column vectors differ,
then we have a one-bit error correcting code.
OK examples:
H= 101100
110010
011001
H= 1010011
0100111
1101001
NG examples:
H= 110100
101010
010001
H= 1010011
1100111
1101001
C = {v | HvT = 0 mod 2},
the discussion is easier
if the right-submatrix of H is an identity matrix...
H= 101001
010011
110100
010101
5
construction of a code
coefficients
H= 101 100
G= 100 110
101
110 010
110
010 011
011 001 as is 011 transpose
001 101
p1 = x 1
+ x3
p2 = x 1 + x 2
p3 =
x2 + x3
000000,
001101,
010011,
011110,
100110,
101011,
110101,
111000.
codewords
6
the “shape” of a check matrix
a parity check matrix with m rows and n columns
=n
 a code with... length
# of information symbols = n – m (= k)
# of parity symbols
=m
1
0
0
1
1
0
1
1
0
1
0
1
0
1
1
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0

0
0

0
1 
m=5
1

0
H  1

0
1

code length 9
= 4 information symbols
+ 5 parity symbols
n=9
“vertically longer” H means more parity symbols in a codeword
 less number of information symbols
NG
good
 not efficient, not favorable...
7
Hamming code
To design a one-bit error correcting code with small redundancy,
construct a horizontally longest check matrix (all columns differ).
Richard Hamming
1915-1998
Hamming code
determine m, # of parity check symbols
list up all nonzero vectors with length m
use the vectors as columns of H
(any order is OK, but let the right-submatrix be an identity)
m = 3:
1

1 1 1 0 1 0 0 


0
H  1 1 0 1 0 1 0 , G  
0
1 0 1 1 0 0 1 



0

0
1
0
0
0
0
1
0
0
0
0
1
1
1
1
0
1
1
0
1
1

0
1

1
length 7
= 4 information
+ 3 parity
8
Parameters of Hamming code
Hamming code
determine m, # of parity check symbols
design H to have 2m – 1 different column vectors
H has m rows and 2m – 1 columns
length
n = 2m – 1
# of information symbols k = 2m – 1 – m
m
# of parity symbols
(n, k) code:
code with length n, and
k information symbols
m
2
3
4
5
6
7
n
3
7
15
31
63
127
k
1
4
11
26
57
120
9
comparison of codes
two codes which can correct one-bit errors:
(7, 4) Hamming code
(9, 4) 2D code
which is the “better”?
Hamming code is more efficient (small redundancy)
Hamming code is more reliable
correct data transmission with BSC with error prob. p:
= 0.85
– Hamming code: (1-p)7 + 7p(1-p)6
= 0.77
– 2D code:(1-p)9 + 9p(1-p)8
if p=0.1
“shorter is the better”
10
codes better than Hamming code?
(7, 4) Hamming code
3 parity bits are added to correct possible one-bit errors
Is there a one-bit error correcting (6, 4) code, with only 2 parities?
No. Assume that such a code exists, then...
there are 24 = 16 codewords
# of vectors decoded to a given codeword = 1+6=7
# of vectors decoded to any one of codewords = 7×16 = 112
# of vectors with length 6 = 26 = 64, which is < 112
contradiction! (矛盾)
{0, 1}6
11
Hamming code is perfect
(7, 4) Hamming code
there are 24 = 16 codewords
# of vectors decoded to a given codeword = 1+7=8
# of vectors decoded to any one of codewords = 8×16 = 128
# of vectors with length 7 = 27 = 128
all of 128 vectors are exactly
partitioned to 16 classes with 8 vectors
{0, 1}7
Hamming code is a perfect one-bit error correcting code:
𝑛
𝑛
𝑘
2
+
= 2𝑛
0
1
12
advanced topic: multi-bit errors?
Hamming code is a perfect one-bit error correcting code.
Are there codes which correct two or more errors?
Yes, there are many...
one-bit error: syndrome = one column vector of H
two-bits error: syndrome = sum of two column vectors of H
different combinations of t columns in H results in different sums,
 the code corrects t-bits errors.
13
advanced topic: two-bits error correcting code
1

1
1
H
1
0

0
ℎ𝑖 + ℎ𝑗 ≠ ℎ𝑖′ + ℎ𝑗′ if
1
1
0
0
1
1
𝑖, 𝑗
1 0 0
0 1 0
0 0 1
0 0 0
0 0 0
0 0 0
≠ {𝑖 ′ , 𝑗 ′ }
0
0
0
1
0
0
0
0
0
0
1
0
0

0
0
.
0
0

1
 different two-bits errors results in different syndromes
received 11111000 ⇒ the syndrome is 110111T= h2 + h6
⇒ errors at the 2nd and 6th bits
⇒ 10111100 should be transmitted
14
ability of the code
Error-correcting capability of a code is determined by
the relation of column vectors of a parity check matrix.
It is not easy to consider all the combinations of columns.
More handy and easy means is needed.
For linear codes, we can use
minimum distance, or minimum weight
15
similarity of vectors
a codeword u is sent, errors occur, and v is received:
In a practical channel, the distance between u and v is small.
BSC with error prob. 0.1
u = 000
v = 000, with probability 0.729
v = 001, with probability 0.081
v = 011, with probability 0.009
v = 111, with probability 0.001
If there is another codeword u’ near u,
then v = u’ occurs with notable probability.
safe
not safe
16
Hamming distance
a=(a1, a2,..., an), b=(b1, b2,..., bn): binary vectors
the Hamming distance between a and b, dH(a, b)
= the number of symbols which are differ between a and b
dH(0100,1101) = 2 dH(000, 011) = 2
dH(a, b) = dH(a + b, 0)
dH(000, 111) = 3
dH(011, 011) = 0
If a vector u with length n is sent over BSC with error prob. p,
then a vector v with dH(u, v) = i is received with prob. (1 – p)n–ipi.
inverse correlation between the distance and the probability
(逆相関)
17
Hamming distance between codewords
code with length 4 ...
vectors = vertices of a 4-dimensional hyper-cube
codewords = subset of vertices
C1={0000, 1100, 0011, 1111}
two or more edges
between codewords
good
C2={0000, 0001, 1110, 1111}
some codewords are
side-by-side
bad
18
minimum distance
the minimum Hamming distance of a code C:
d min  min{ d H (a, b) | a, b  C , a  b}.
C1={0000, 1100, 0011, 1111}
dmin = 2
C2={0000, 0001, 1110, 1111}
dmin = 1
19
computation of the minimum distance
1 0 0 1 1 0


G

0
1
0
1
0
1
consider a linear code whose generator matrix is


0 0 1 0 1 1


000000
001011
010101
011110
100110
101101
110011
111000
000000 001011 010101 011110 100110 101101 110011 111000
0
3
3
4
3
4
4
3
3
0
4
3
4
3
3
4
3
4
0
3
4
3
3
4
4
3
3
0
3
4
4
3
3
4
4
3
0
3
3
4
4
3
3
4
3
0
4
3
4
3
3
4
3
4
0
3
3
4
4
3
4
3
3
0
dmin = 3 for this code
Do we need to consider all of 2k×2k combinations?
20
minimum Hamming weight
the Hamming weight of a vector u: wH(u) =dH(u, 0)...# of 1s
the minimum Hamming weight of a code C:
wmin=min{wH(u) : uC, u  0}
v
u+v
Lemma: if C is linear, then wmin = dmin.
u
0
proof of wmin≤dmin:
let u and v be codewords with dmin = dH(u, v).
u + v C, and wmin ≤ wH(u + v) = dH(u + v, 0) = dH(u, v) = dmin.
proof of dmin≤wmin:
let u be the codeword with wmin = wH(u).
dmin ≤ dH(u, 0) = wH(u) = wmin.
21
examples of minimum Hamming weight
(9, 4) 2D code:the minimum Hamming weight is 4
000000000
0
010010011
4
100010101
4
110000110
4
000101011
4
010111000
4
100111110
6
110101101
6
001001101
4
011011110
6
101011000
4
111001011
6
001100110
4
011110101
6
101110011
6
111100000
4
(7, 4) Hamming code: the minimum Hamming weight is 3
0000000 1000101 0100111 0010110 0010110 1010011 0110001 1110100
0
3
4
3
3
4
3
4
0001011 1001110 0101100 1101001 0011101 1011000 0111010 1111111
3
4
3
4
4
3
4
7
22
general case of Hamming code
lemma: The minimum Hamming weight of a Hamming code is 3.
proof sketch:
Let H = (h1, ..., hn) be a parity check matrix:
{h1, ..., hn} = the set of all nonzero vectors
if codeword u with weight 1, then HuT = hi = 0...contradiction
if codeword u with weight 2, then HuT = hi + hj= 0
...this means that hi = hj, contradiction
no codewords with weight 1 or 2
23
proof (cnt’d)
lemma: The minimum Hamming weight of a Hamming code is 3.
proof sketch:
Let H = (h1, ..., hn) be a parity check matrix:
{h1, ..., hn} = the set of all nonzero vectors
Choose x, y as you like, and choose z so that hx + hy = hz.
Let u be a vector having 1 at the x-th, y-th and z-th positions,
then HuT= hx + hy + hz = 0, meaning that uC.
codewords with weight 3 are constructible
24
minimum distance and error correction
What does dmin=3 mean?
Any two codewords are differ at three or more symbols.
At least three-bits errors are needed to change
a codeword to a different codeword.
u’ v’
u
v
u
v
error error error
error
error
{u’ | dH(u, u’)=1, uC}  {v’ | dH(v, v’)=1, vC} =
We can distinguish
a result of one-bit error from a codeword u, and
a result of one-bit error from other codeword v.
25
decoding territory
dmin=3:define territories around codewords
radius = 1... territories do not overlap
radius = 2... territories do overlap
rule of the error correction:
if a received vector r falls in the territory of
a codeword u, then r is decoded to u.
if dmin=3, then the maximum radius of the territory is at most 1.
 the code can correct up to one-bit errors
26
general discussion
𝑑𝑚𝑖𝑛 −1
⌋
2
define 𝑡𝑚𝑎𝑥 = ⌊
(⌊𝑥⌋ is the largest integer ≤ x)
dmin=7, tmax=3
tmax
tmax
tmax
dmin=8, tmax=3
tmax
territories do not overlap if the radius ≤ tmax
⇒ C can correct up to tmax bits errors in a codeword.
27
examples
dmin
tmax
3
1
4
1
5
2
6
2
7
3
8
3
28
about tmax
tmax is the maximum radius that is allowed
we can consider smaller territories with radius < tmax
tmax
t
vectors which do not belong to any territory
detect errors, but do not correct them
29
advantage and disadvantage
sent codeword
t
received
decoded to the
correct codeword
error detection
only
decoded to a
wrong codeword
radius
large
correct
large
detect
small
wrong
large
small
small
large
small
The radius should be controlled according to applications.
30
familiar image?
A
B
C
C
D
P(A) …large
P(B), P(C), P(D) …large
P(miss) …small
A: award of 10,000 Yen
B, C, D: penalty of 1,000,000 Yen
A
B
D
P(A) …small
P(B), P(C), P(D) …small
P(miss) …large
31
summary of today’s class
Hamming code
one-bit error correcting
perfect code
the minimum distance and minimum weight
handy measure of the error correcting capability
large minimum distance means more power
32
exercise
Define (one of) (15, 11) Hamming code:
construct a parity check matrix, and
determine the corresponding generator matrix
Let C be a linear code with the following parity check matrix.
Show that the minimum distance of C is 4.
1

1
H 
1

0
1 1 0 1 0 0 0

1 0 1 0 1 0 0
.
0 1 1 0 0 1 0

1 1 1 0 0 0 1
33