Information Theory

In the beginning was the Word...
情報理論:日本語,英語で隔年開講
今年度は日本語で授業を行うが,スライドは英語のものを使用
Information Theory: English and Japanese, alternate years
the course will be taught in Japanese in this year
video-recorded English classes  Lecture Archives 2011
Slides are in English
this slide can be found at http://apal.naist.jp/~kaji/lecture/
test questions are given in both of Japanese and English
1
Information Theory
Information Theory (情報理論)
is founded by C. E. Shannon in 1948
focuses on mathematical theory of communication
gave essential impacts on today’s digital technology
wired/wireless communication/broadcasting
CD/DVD/HDD
Claude E. Shannon
1916-2001
data compression
cryptography, linguistics, bioinformatics, games, ...
In this class, we learn basic subjects of information theory.
(half undergraduate level + half graduate school level)
2
class plan
This class consists of four chapters (+ this introduction):
chapter 0: the summary and the schedule of this course
(today)
chapter 1: measuring information
chapter 2: compact representation of information
chapter 3: coding for noisy communication
chapter 4: cryptography
3
what’s the problem?
To understand our problem, date back to 1940s...
Teletype (電信) was widely used for communication.
Morse code: dots ( ∙ ) and dashes ( − )
dot = 1 unit long, dash = 3 units long
1 unit silence between marks
3 units silence between letters, etc.
10111000111000000010101010001110111011100011101110001
They already had “digital communication”.
4
machinery for information processing
No computers yet, but there were “machines”...
Teletype model 14-KTR, 1940
http://www.baudot.net/teletype/M14.htm
Enigma machine
http://enigma.wikispaces.com/
They could do something complicated.
The transmission/recording of messages were...
inefficient...messages should be as short as possible
unreliable...messages are often disturbed by noises
The efficiency and the reliability were two major problems.
5
the model of communication
A communication system can be modeled as;
C.E. Shannon, A Mathematical Theory of Communication,
The Bell System Technical Journal, 27, pp. 379–423, 623–656, 1948.
encoder,
modulator,
codec, etc...
channel,
storage medium,
etc...
6
what is the “efficiency”?
A communication is efficient if the size of B is small.
subject to A = D, or A ≈ D
with, or without noise (B ≠ C, or B = C)
A
B
C
D
7
problem one: efficiency
Example: You need to record the weather of Tokyo everyday.
weather = {sunny, cloudy, rainy}
You can use “0” and “1”, but you cannot use blank spaces.
weather codeword
sunny
00
cloudy
01
rainy
10
0100011000
2-bit record everyday
200 bits for 100 days
Can we shorten the representation?
8
better code?
weather
sunny
cloudy
rainy
code A
00
01
10
code B
00
01
1
code A...0100011000
code B...010001100
The code B gives shorter representation than the code A.
Can we decode the code B correctly?
Yes, as far as the sequence is processed from the beginning.
Is there a code which is more compact than this code B?
No, and Yes(→ next slide).
9
think average
Sometimes, events are not equally likely...
weather probability
sunny
0.5
cloudy
0.3
rainy
0.2
code A
00
01
10
code B code C
00
1
01
01
1
00
with the code A, 2.0 bit / event (always)
with the code B,
20.5 + 20.3 + 10.2 = 1.8 bit / event in average
with the code C,
10.5 + 20.3 + 20.2 = 1.5 bit / event in average
10
the best code?
Can we represent information with 0.00000000001 bit per event?
...No, maybe.
It is likely that there is a “limit” which we cannot get over.
Shannon investigated the limit mathematically.
→ For this event set, we need 1.485 or more bit per event.
weather probability
sunny
0.5
cloudy
0.3
rainy
0.2
This is the amount of information
which must be carried by the code.
11
class plan in April
chapter 0: the summary and the schedule of this course
chapter 1: measuring information
We establish a mathematical means to measure
information in a quantitative manner.
chapter 2: compact representation of information
We learn several coding techniques which give compact
representation of information.
chapter 3: coding for noisy communication
chapter 4: cryptography
12
what is the “reliability”?
A communication is reliable if A = D or A ≈ D.
the existence of noise is essential (B ≠ C)
How small can we make the size of B?
A
B
C
D
13
problem two: reliability
Communication is not always reliable.
transmitted information ≠ received information
ABCADC
ABCABC
Errors of this kind are unavoidable in real communication.
In the usual conversation, we sometimes use phonetic codes.
ABC
Alpha, Bravo, Charlie
あさひの「あ」
いろはの「い」
Alpha, Bravo, Charlie
ABC
14
phonetic code
the real
information
redundant (冗長な) information
for correcting possible errors
A phonetic code adds redundant information.
The redundant part helps correcting possible errors.
→use this mechanism over 0-1 data, and we can correct errors!
15
redundancy
Q. Can we add “redundancy” to binary data?
A. Yes, use parity bits.
A parity bit is...
a binary digit which is to make the number of 1’s in data even.
00101 → 001010 (two 1’s → two 1’s)
11010 → 110101 (three 1’s → four 1’s)
One parity bit may tell you that there are odd numbers of errors,
but not more than that.
16
to correct error(s)
basic idea: use several parity bits to correct errors
Example: Add five parity bits to four-bits data (a0, a1, a2, a3).
a0
a1
p0
a2
a3
p1
codeword =
(a0, a1, a2, a3, p0, p1, q0, q1, r)
This code corrects one-bit error,
but it is too straightforward.
q0
q1
r
17
class plan in May
chapter 0: the summary and the schedule of this course
chapter 1: measuring information
chapter 2: compact representation of information
chapter 3: coding for noisy communication
We study practical coding techniques for finding and
correcting errors.
chapter 4: cryptography
We review techniques for protecting information from
intensive attacks.
18
schedule
(Mon)
April
May
June
04
Tue
10
17
24
01
×
08
15
22
29
05
Thu
12
19
26
03
×
10
×
17
24
31
report (quiz):
will be assigned by
the end of April
test:
questions given in English/Japanese
statistics in 2011: A ... 51 / B ... 20 / C ... 18 / did not pass ... 13
19
chapter 1:
measuring information
20
motivation
“To tell plenty of things, we need more words.”
...maybe true, but can you give the proof of this statement?
We will need to...
1.
measure information quantitatively (定量的に測る)
2.
observe the relation between the amount of information
and its representation.
Chapter 1 focuses on the first step above.
21
the uncertainty (不確実さ)
Information tells what has happened at the information source.
Before you receive information, there is much uncertainty.
After you receive information, the uncertainty becomes small.
the difference of uncertainty  the amount of information
FIRST, we need to measure the uncertainty of information source.
this difference indicates
the amount of information
much
uncertainty
Before
After
small
uncertainty
22
the definition of uncertainty
The uncertainty is defined according to the statistics (統計量),
BUT,
we do not have enough time today....
In the rest of today’s talk,
we study two typical information sources.
memoryless & stationary information source
Markov information source
23
assumption
In this class, we assume that...
an information source produces one symbol per unit time
(discrete-time information source)
the set of possible symbols is finite and countable (有限可算)
(digital information source)
Note however that, in the real world,
there are continuous-time and/or analogue information sources.
cf. sampling & quantization
24
Preliminary (準備)
Assume a discrete-time digital information source S:
M = {a1, ..., ak}... the set of symbols of S
(S is said to be a k-ary information source.)
Xt...the symbol which S produces at time t
The sequence X1, ..., Xn is called a message produced by S.
Example: S = fair dice
if the message is
, then
25
memoryless & stationary information source
A memoryless & stationary information source satisfies...
memoryless condition: 𝑃𝑋𝑡 |𝑋1 …𝑋𝑡−1 𝑎𝑡 𝑎1 … 𝑎𝑡−1 = 𝑃𝑋𝑡 (𝑎𝑡 )
“A symbol is chosen independently from past symbols.”
stationary condition: 𝑃𝑋𝑡 𝑎 = 𝑃𝑋1 𝑎 for any t
“The probability distribution is time invariant.”
123456...
trial 1 ajcgea...
trial 2 gajkfh...
trial 3 wasdas...
:
:
memoryless = 無記憶
stationary = 定常
the same probability distribution
26
memoryless & stationary information source
Examples of memoryless & stationary information source:
the “dice” example, coin toss, ...
information sources with memory:
English text: 𝑃𝑋𝑡 |𝑋𝑡−1 𝑢 𝑞 ≫ 𝑃𝑋𝑡 |𝑋𝑡−1 𝑢 𝑢
wireless communication...burst noise
not-stationary information sources:
weather...P(snow) is large in winter
and more?
27
Markov information source
Markov information source
a simple model of information source with memory
The choice of the next symbol depends on
at most m previous symbols
Andrey Markov
(m-th order Markov source)
1856-1922
𝑃𝑋𝑡 |𝑋1 …𝑋𝑡−1 𝑎𝑡 𝑎1 … 𝑎𝑡−1 = 𝑃𝑋𝑡 |𝑋𝑡−𝑚…𝑋𝑡−1 𝑎𝑡 𝑎𝑡−𝑚 … 𝑎𝑡−1
m = 0  memoryless source
m = 1  simple Markov source
28
Example of (simple) Markov source
S ... memoryless & stationary source with P(0) = q, P(1) = 1 – q
Xt
S
R
1-bit register
if Xt–1 = 0, then R = 0:
S = 0  Xt = 0 ... PXt|Xt–1(0 | 0) = q
S = 1  Xt = 1 ... PXt|Xt–1(1 | 0) = 1 – q
if Xt–1 = 1, then R = 1:
S = 0  Xt = 1 ... PXt|Xt–1(1 | 1) = q
S = 1  Xt = 0 ... PXt|Xt–1(0 | 1) = 1 – q
29
Markov source as a finite state machine
m-th order k-ary Markov source:
The next symbols depends on previous m symbols.
The model is having one of km internal states.
The state changes when a new symbol is generated.
 finite state machine
1 / 1–q
Xt
S
R
1-bit register
0/q 0
generated
symbol
1 1/q
0 / 1–q
probability
30
two important properties
irreducible (既約) Markov source:
We can move to any state from any state.
A
this example is NOT irreducible
B
C
aperiodic (非周期的) Markov source:
We have no periodical behavior (strict discussion needed...).
A
B
this example is NOT aperiodic
irreducible + aperiodic = regular
31
example of the regular Markov source
0/0.9
1/0.1
A
B
0/0.8
1/0.2
start from the state 0
time 1
2
3
P (state=A) 1.0 0.9 0.89
P (state=B) 0.0 0.1 0.11
start from the state 1
time 1
2
3
P (state=A) 0.0 0.8 0.88
P (state=B) 1.0 0.2 0.12
4
0.889
0.111
4
0.888
0.112
...
...
...
...
...
...
converge (収束する) to
the same probabilities
stationary probabilities
32
computation of the stationary probabilities
0/0.9
A
t : P(state = A) at time t
1/0.1
t : P(state = B) at time t
B
t+1 = 0.9t + 0.8t
0/0.8
1/0.2 t+1 = 0.1t + 0.2t
t+1+ t+1= 1
If t and t converge to  and , respectively, then
we can put t+1=t= and t+1=t=.
 = 0.9 + 0.8
 = 0.1 + 0.2
=8/9, =1/9
+= 1
33
Markov source as a stationary source
After enough time has elapsed...
a regular Markov source can be regarded as a stationary source
0/0.9
1/0.1
A
=8/9, =1/9
B
0/0.8
1/0.2
0 will be produced with probability P(0) = 0.9  + 0.8  = 0.889
1 will be produced with probability P(1) = 0.1  + 0.2  = 0.111
34
summary of today’s class
overview of this course
motivation
four chapters
typical information sources
memoryless & stationary source
Markov source
35
exercise
Determine the stationary probabilities.
Compute the probability that 010 is produced.
A
0/0.4 0/0.5
B
0/0.8
1/0.2
1/0.6
C
1/0.5
This is to check your understanding.
This is not a report assignment.
36