A, B

• Uncertainty
• The axioms of probability
• Bayes rule
• Belief networks
1
Uncertainty
No matter in proposition logic or first order logic reasoning, it
is assumed that
assumed
仮定した
• facts are known to be true,
• facts are known to be false,
• or nothing is known.
In general, however, people are not sure what the relevant
facts are true or false and thing becomes unpredictable due to
• partial observation (e.g. road state, other driver’s plan)
unpredictable
予知できない
• noisy sensors (Radio traffic report)
• uncertainty in action outcome (flat tire, accident)
Rules may give unreliable conclusions
e.g.
unreliable信頼
できない
At = Leaving for airport t minutes before flight, will At get me airport on time??
toothache  cavity?? gum disease??
2
Probabilistic Reasoning
• Probability theory provides a way of dealing rationally with
Probability theory
確率論
uncertainty – assigning a numerical degree of belief between 0 and 1 to sentences.
e.g. P(cavity) = 0.1 indicates a patient having cavity with a probability of 0.1 (a 10% chance).
• Degree of truth, as opposed to degree of belief, is the subject
of fuzzy logic.
Fuzzy logic
ファジー論理
cavity
虫歯
Belief
信じること
• Probabilistic reasoning may be used in the following three
types of situations:
- The world is really random
- The relevant world is not random given enough data,
but it is not always have access to that much data
- The world appears to be random because we have not
describe it at the right level.
3
Necessary formulas
• Definition of conditional probability
P(A|B) =
Conditional probability
条件確率
P(A  B)
P(B
)
Where, P(A|B) is read as “the probability of A given that all we know is B”
for example, P(cavity|Toothache)=0.8
indicates that if a patient is observed to have a toothache and
toothache
歯痛
there is no other information,
then the probability of the patient having a cavity will be 0.8 (a 80% chance).
• Product rule gives an alternative formulation
Product
alternative
代わりの
(掛け算の)積 P(A  B)=P(A|B)P(B)
it comes from the fact that for A and B to be true, we need B to be true,
and then A to be true given B.
You can also write
P(A  B)=P(B|A)P(A)
4
The axioms of probability
For any propositions, A, B
axiom
公理
• 0  P(A)  1
all probabilities are between 0 and 1
• P(True) = 1 and P(False) = 0
Necessary true propositions have probability 1, and
necessary false propositions have probability 0.
assign
与える
• P(A  B) = P(A) + P(B) – P(A  B)
The total probability of AB is seen to be the sum of the
subtract
引く
probabilities assigned to A and B, but with P(A  B)
subtracted out so that those cases are not counted twice.
True
A
B
AB
P(A  B)=Porange_color+Ppurple_color
5 +Pgreen_color
Chain rule
• It is derived by successive application of product rule:
derive
得る
P(X1, …, Xn) = P(X1, …, Xn-1) P(Xn| X1, …, Xn-1 )
successive
継続的な
= P(X1, …, Xn-2)P(Xn-1| X1, …, Xn-2 )P(Xn| X1, …, Xn-1 )
=…
= P(X1)P(X2|X1) … P(Xn-1| X1, …, Xn-2 )P(Xn| X1, …, Xn-1 )
= ni=1P(Xi| X1, …, Xi-1)
• For example, we can calculate the probability of the event that the alarm has
event出
来事
sounded but neither a burglary nor an earthquake has occurred, and both John
and Mary call. (referring to conditional probability in next page)
P(J  M  A   B   E ) ?
= P(J|A)P(M|A)P(A|  B   E ) P( B )P( E )
burglary
侵入盗
Earthquake = 0.90 x 0.70 x 0.001 x 0.999 x 0.998
地震
= 0.00062
6
P(E)
P(B)
0.001
earthquake
burglary
alarm
A P(J)
T 0.90
F 0.05
John calls
B E
P(A)
T
T
F
F
0.95
0.94
0.29
0.001
T
F
T
F
Mary calls
0.002
A P(M)
T 0.70
F 0.01
7
Bayes’ rule
• Recall the two forms of the product rule:
Recall
P(A  B)=P(A|B)P(B)
思い出す
P(A  B)=P(B|A)P(A)
P(B|A) = P(A |B)P(B)
P(A
)
Why is Bayes’ rule useful?
It can be used for assessing diagnostic probability from causal probability.
原因となる
P(Effect |Causal)P(Causal)
Assess
査定する
P(Causal|Effect) =
P(Effect)
E.g., let C be cavity and T be toothache:
P(C|T) =
P(T|C)P(C)
P(T)
=
Causal
Effect
(原因から直接引き
起こされる)結果
0.5 x 0.0001 = 0.0005
0.1
8
Normalization
Consider the equation for calculating the probability of cavity given a toothache
P(C|T) =
P(T|C)P(C)
P(T)
Suppose we are also concerned with the possibility that the patient is suffering
from gum disease G given a toothache.
Suffer from
患う
Gum disease
P(G|T) = P(T|G)P(G)
歯肉炎
P(T)
Prior
前の
Comparing these two equations, we see that in order to computer the relative
likelihood of cavity and gum disease, given a toothache, we need not assess the
prior probability P(T) , since we have
Relative likelihood
相対的可能性
P(C/T)
P(G/T)
=
P(T|C)P(C)
P(T|G)P(G)
=
0.5 x 0.0001
0.8 x 0.005
=
1
80
That is,
gum disease is 80 times more likely than cavity, given a toothache. 9
Conditioning
• Introducing a variable as an extra condition:
P(X|Y) = z P(X|Y, Z= z)P(Z=z|Y)
e.g., P(RunOver|Cross)
= P(RunOver|Cross, Light=green)P(Light=green|Cross)
+P(RunOver|Cross, Light=yellow)P(Light=yellow|Cross)
+P(RunOver|Cross, Light=red)P(Light=red|Cross)
• When Y is absent, we have
P(X) = z P(X|Z= z)P(Z=z) = z P(X,Z= z)
P(RunOver)
= P(RunOver|Light=green)P(Light=green)
+P(RunOver|Light=yellow)P(Light=yellow)
X and Y are
Conditional independence
(under the condition, Z)
条件付きの独立
+P(RunOver|Light=red)P(Light=red)
The above equation expresses
the conditional independence of RunOver and Cross given Light.
10
Full joint distributions
That is, the unconditional probability of any proposition is computable
as the sum of entries from the full joint distribution.
•For any proposition  defined on the random variables
(wi) is true or false
unconditional
無条件の
•  is equivalent to the disjunction of wi’s where (wi) is true, hence
equivalent to
同等で
P() = {wi: (wi) } P(wi)
Conditional probability can be computed in the same way as a ratio.
P( |) =
P(  )
P()
joint 共
同の
e.g., Suppose Toothache and Cavity are the random variables:
w1=Cavity
w2= Toothache
Toothache = True Toothache = False
Cavity = True
Cavity = False
P(Cavity |Toothache) =
P(w1) = 0.04 + 0.06 = 0.10
P(w2) = 0.04 + 0.01 = 0.05
0.04
0.01
0.06
0.89
P(Cavity  Toothache)
P(Toothache)
= (0.04) / (0.04 + 0.01) = 0.8
11
Independence
• Two random variables A B are (absolutely) independent iff
P(A|B) = P(A)
or P(A, B) = P(A|B)P(B) = P(A)P(B)
e.g., A and B are two coin flips
flip (勝負を決めるため)硬
貨を空中へはじき上げること
P(A=head, B=head) = P(A=head)P(B=head)=0.5 x 0.5 = 0.25
• If n Boolean variables are independent, the full joint is
P(X1, …, Xn) = ni=1P(Xi)
• Conditional independence
Absolute independence is a very
strong requirement, seldom met!
P(A|B, C) = P(A|C)
we say that A is conditionally independent of B given C
e.g., P(Catch|Toothache, Cavity) = P(Catch|Cavity)
P(Catch|Toothache,  Cavity) = P(Catch|  Cavity)
It means that if a patient has a cavity (or not a cavity), the probability
that the probe catches in it does not depend on
whether the patient has a toothache
12
Belief networks
• A simple, graphical notation for conditional independence assertions and
hence for compact specification of full joint distributions.
• Belief networks are also called “causal nets”, “Bayes nets”,
or “influence diagrams”.
0.001
earthquake
burglary
alarm
A P(J)
T 0.90
F 0.05
John calls
B E
P(A)
T
T
F
F
0.95
0.94
0.29
0.001
T
F
T
F
Mary calls
notation
表記法
compact specification
簡潔な仕様
For example, figure in page 7 shows a belief network.
P(B)
assertion
断言
P(E)
Belief
0.002
信じること
Variables: Burglary, Earthquake,
Alarm, JohnCallas,
MaryCalls
A P(M)
T 0.70
F 0.01
13
Syntax and Semantics
direct
Syntax:
(ある方向に)向ける
• A set of nodes, one per variable
• a directed, a cyclic graph (link shows “directly influences”)
influence
影響
• a conditional probability distribution for each node given its
probability distribution
parents:
確率分布
P(Xi|Parents(Xi))
Semantics:
• “Global” semantics defines the full joint probability distribution as
the product of the local conditional distributions:
P(X1, …, Xn) = ni=1P(Xi|Parents(Xi))
e.g., P(J  M  A   B   E ) =
P(J|A)P(M|A)P(A|  B   E ) P( B )P( E )
14
Constructing Bayes nets
• Equation in global semantics defines what a given belief means. It does
not explain how to construct a belief network.
• The equation implies certain conditional independence relationships
Imply
暗に含む
that can be used to guide constructing the topology of the network.
P(X1, …, Xn) = ni=1P(Xi|Parents(Xi))
----(1)
If we rewrite the joint in terms of a conditional probability using the
definition of conditional probability:
P(x1,…, xn) = P(xn|xn-1, …, x1)P(xn-1, …, x1)
topology
形態
probability distribution
確率分布
=…
= P(xn|xn-1, …, x1) P(xn-1|xn-2, …, x1) …P(x2|x1)P(x1)
= ni=1P(xi|xi-1, …, x1)
----(2)
• Comparing (1) and (2), we see that specification of the joint is equivalent
to the general assertion that
P(Xi|Xi-1, …, X1) = P(Xi|Parents(Xi))
provided that Parents(Xi)  {Xi-1, …, X1}. Select parents from Xi-1, …, X1 15
Constructing Bayes nets (continue …)
The general procedure for incremental network construction is as follows:
Imply
1. Choose the set of relevant variables Xi that describe the domain.
暗に含む
2. Choose an ordering for the variables.
3. While there are variables left:
topology
形態
(a) Pick a variable Xi and add a node to the network for it.
(b) Set Parents(Xi) to some minimal set of nodes already in the net such that
the conditional independence property is satisfied.
(c) Define the conditional probability table for Xi.
For example, we choose the ordering M, J, A, B, E in the burglary example.
Add MaryCalls
MaryCalls
JohnCalls
Add JohnCalls
Add Alarm
Alarm
Add Burglary
Add Earthquake
Burglary
P(J|M)=P(J)? No. Since if Mary calls,
that probably means the alarm has
gone off , which of course would make
it more likely that John calls.
Earthquake
P(E|B,A,J,M)=P(E|A)? No. P(E|B,A,J,M)=P(E|A,B)? yes.
Since if the alarm is on, it is more likely that there is an
earthquake. But if we know there has been a burglary, then
that would change for the probability of an earthquake due
to the alarm.
P(A|J,M)=P(A|J)? No. P(A|J,M)=P(A)? No.
Since if both call, it is more likely that the
alarm has gone off than if just one or neith
call.
P(B|A,J,M)=P(B)? No.
P(B|A,J,M)=P(B|A)? Yes.
Since the alarm likely
16
gives us information about
a burglary.
Exercises
Ex1.
According to the following tables, could you calculate the probability of
the event that the alarm has sounded, a burglary has occurred but an
earthquake has not occurred, and both John and Mary call.
P(E)
P(B)
0.95
earthquake
burglary
alarm
A P(J)
T 0.90
F 0.05
John calls
B E
P(A)
T
T
F
F
0.95
0.94
0.29
0.001
T
F
T
F
Mary calls
0.002
A P(M
)
T 0.70
F 0.01
17
Exercises
Ex2. (optional)
Constructing a belief network for the burglary
example. Let us choose the ordering M, J, E, B, A
18