AdaBoost
Reference
Yoav Freund and Robert E. Schapire. A decision-theoretic
generalization of on-line learning and an application to
boosting. Journal of Computer and System Sciences,
55(1):119-139, 1997.
“Two heads are better than one.”
三人寄れば文殊の知恵
Boosting is a technique of constructing a strong classifier
by combining a collection of weak classifiers.
Training dataset
x1
T1 T2 T3 T4
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
1
1
0
1
0
1
1
0
1
1
0
0
1
0
1
1
1
1
1
0
0
0
0
1
0
0
0
1
0
1
1
1
1
0
0
1
1
1
1
1
1
1
1
0
0
0
目標属性
(Objective Attribute)
1
1
1
0
0
0
0
0
1
1
1
1
1
0
0
0
y1
( x1 , y1 )
( xN , y N )
Basic Idea of AdaBoost
• Initially assign an equal weight to each record.
• Iterate the following steps:
1. Generate a hypothesis the error ratio of which
is smaller than that of random guessing.
2. Increase weights of mis-predicted records
relatively higher than the others.
We call a classifier “a hypothesis” following the terms in the cited paper.
hypothesis
T1 T2 T3 T4
Ob
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
1
1
0
1
0
1
1
1
1
1
1
0
0
0
1
1
1
0
0
1
1
1
Size of
Weight
if T1=1
then Ob=0
else Ob=1
0
0
0
0
0
0
0
0
represents the degree of the weight.
New
Weight
another
hypothesis
T1 T2 T3 T4
Ob
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
1
1
0
1
0
1
1
1
1
1
1
0
0
0
1
1
1
0
0
1
1
1
Weight
if T3=1
then Ob=1
else Ob=0
1
1
1
1
1
0
0
0
New
Weight
another
hypothesis
T1 T2 T3 T4
Ob
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
1
1
0
1
0
1
1
1
1
1
1
0
0
0
1
1
1
0
0
1
1
1
Weight
if T4=1
then Ob=1
else Ob=0
1
1
1
0
0
1
1
1
New
Weight
Hypotheses
if T1=1
if T3=1
if T4=1
Simple
then Ob=0
then Ob=1
then Ob=1
Majority
T1 T2 T3 T4 Ob
else Ob=1
else Ob=0
else Ob=0
Voting
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
1
1
1
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
1
1
0
1
0
1
1
1
1
1
1
0
0
0
1
1
1
0
0
1
1
1
1
1
1
0
0
0
0
0
AdaBoost performs weighted majority voting by hypotheses.
Details of AdaBoost
Input
Training dataset
( x1 , y1 ),...,( xN , yN )
Initial weight
wi D(i) 1
1
N
for each i =1,2, …, N
Weak learner WeakLearn that always ouputs
a hypothesis whose error ratio is less than ½.
T: number of iterations to generate weak hypothesis
For each t 1, ...,T , iterate:
1: Compute the distribution pit of each record by normalizing weights
pi wi
t
t
N
i 1
wi
t
2: Call WeakLearn to generate such a weak hypothesis ht that
εt i 1 pt ht ( xi ) yi 1
N
t
2
0 if ht is correct
ht ( xi ) yi
1 otherwise
3: Revise weights
βt εt ( 1 εt )
t 1
i
w
1 ht ( xi ) yi
t
w β
t
i
Revision of weights
t 1
i
w
T 1
i
w
1 ht ( xi ) yi
t
w β
t
i
T
1 ht ( xi ) yi
t
D(i ) β
i 1
1 ht ( xi ) yi
t
β
βt
1
βt 1 because
if ht is correct
otherwise
εt 1
wi1 wi2 wiT 1
2
Output: Final hypothesis hf (Weighted Majority Voting among ht)
1
T
T
1 if t 1 ( log βt )ht ( x ) t 1 ( log βt ) *
hf ( x)
2
0 otherwise
If h f ( xi ) yi , β
T
Lemma1
1 ht ( xi ) yi
t
i 1
βt
i 1
T
1
2
T
If h f ( xi ) 1 and yi 0 : Add t 1 (log βt ) to both sides of (*).
1
T
(log
β
)(
1
h
(
x
))
(log
β
)
t 1
t 1
t
t
t
2
T
If h f ( xi ) 0 and yi 1 :
1
T
t 1 ( log βt )ht ( x ) t 1 ( log βt ) 2
T
1
T
(log
β
)
h
(
x
)
(log
β
)
t 1
t 1
t
t
t
2
T
Observe 1 ht ( x ) 1 ht ( xi ) yi
Observe ht ( x ) 1 ht ( xi ) yi
ε
Definition
D(i)
{i | h f ( xi ) yi }
T 1
βt
i1 wi ε
i 1
N
Lemma2
T
1 ht ( x j ) y j
t
wiT 1 D(i) β
j 1
T 1
w
i 1 i
N
The error of hf for the initial distribution.
T
1
2
T
D(i) βt
j 1
T 1
w
i
{i | h f ( x i ) y i }
D (i ) βt
{i | h ( x ) y } i 1
f i i
T
1
2
1
2
By Lemma1.
ε
T heorem
T
2
εt (1 εt )
t 1
T
ε βt
i 1
1
2
Lemma 2
N
i 1
T 1
i
w
t 1
w
i 1 i
N
N
i 1
1 ht ( xi ) yi
t
wβ
t
i
i 1 w (1 (1 βt )(1 ht ( xi ) yi ))
N
α γ 1 (1 α)γ
t
i
i 1 w (1 βt )i 1 w (1 ht ( xi ) yi )
N
t
i
N
t
i
N
N
1 εt i 1 pit i 1 pit ht ( xi ) yi
i 1 p (1 ht ( xi ) yi )
N
t
i
N
i 1 w (1 ht ( xi ) yi ) (i 1 wit )
N
(i 1 wit )(1 (1 βt )(1 εt ))
t
i
N
(i 1 wit ) 2εt
N
εt
1 2εt
1 βt 1
1 εt 1 εt
t 1
i
N
w
i 1
N
T 1
i
w
i 1
T
ε βt
i 1
ε
T
1
(i 1 w ) 2εt
N
2
N
w
1
i 1
1
i
T
2
ε
t
t 1
2εt βt
t 1
t
i
N
i 1
1
2
T 1
i
w
T
By Lemma2.
2
εt (1 εt )
t 1
because βt εt ( 1 εt ) .
Q.E.D
© Copyright 2026 ExpyDoc