March 4th Scribe Notes

March 4th Scribe Notes
Jonathan W. Yu and Hannah Worrall
36-464/36-664: Applied Multivariate Methods
March 11, 2014
Classification Trees
1
Definition
3
Pruning
• Classification Trees (CT): Predicts a We can use any of the three approaches to check
qualitative response instead of quantitative the effects of pruning. It will generally improve
- the prediction for each terminal gives the the error rate.
most common class for all the data that
falls into the node
4
Example: Carseats
• Regression Trees (RT): Predicts a quan- Sales: continuous response variable
titative response for each terminal node - ⇒ High: binary representation, 1 (> 8) and 0
its prediction of each terminal node is the (≤ 7)
mean for all the data that falls into the node
# Regress High on all variables except sales
tree.Carseats<-tree(high~. - sales, Carseats)
summary(tree.Carseats)
2 How to grow CT
# Plot the tree (see Figure 1 in Appendix)
plot(tree.Carseats,pretty=0)
RSS cannot be used to make splits since we have
categorical response. Instead, we should do use Note: ”Misclassification Error Rate” (training
classification error rate to determine how to best error rate): percentage of misclassified nodes
split the tree.
To properly judge the classification of the CT,
we should test it on the test error rather than
E = 1 − max(ˆ
pmk )
the simple training error. You would then divide
where pˆmk = proportion of training observations the total correct predictions by the total values
in the mth region from class k The method is not in the test data.
sensitve: we will use 2 other methods that are
set.seed(2)
preferred.
test <- sample(1:nrow(Carseats),200)
Carseats.train <- Carseats[-test,]
1. Gini Index
High.train<-High[-train]
tree.carseats<-tree (High~.-Sales,Carseats,subset=train)
tree.pred<-predict(tree.carseats,Carseats.train,
K
X
type="class")
G=
pˆmk (1 − pˆmk )
table(tree.pred,High.train)
k=1
The gini index takes a small value if all pˆmk
are close to 0 or 1 (⇒ minimize for less error
in assignments) The gini index is a measure
of node purity with small value indicating
pure nodes.
5
• Regression trees & classification trees are
easy to understand
• can be displayed graphically. It can have
qualitative responses.
2. Cross-Entropy
D=−
Summary
K
X
pˆmk log pˆmk ≥ 0
k=1
Again, this is a measure of node purity and
small values indicate a pure node.
Both Gini Index and Cross-Entropy are more sensitve to purity (quality of tree split) of node. If
we want to worry about prediction accuracy, we
will look at classification error rate.
However, it does not necessarily have the best
prediction rate and has high variance.
6
Future: Random Forests
Random forests are useful as many trees suffer
from high variance. You can use baggin for predictions but these trees are correlated. Therefore you can use random forests to decorrelate
the trees.
A
Figures
ShelveLoc: Bad,Medium
|
Price < 92.5
Price < 135
Income < 46
US: No
Price < 109
Advertising < 13.5
Income < 57
No
Yes
Yes
Population < 207.5
CompPrice < 110.5
Yes
Yes
No
Yes
Yes
No
Age < 54.5
CompPrice < 124.5
CompPrice < 130.5 CompPrice < 122.5
Price < 106.5
Price < 122.5
Income < 100
Price < 125
Yes
Population < 177
No
No
Yes
No
Income < 60.5
CompPrice < 147.5
ShelveLoc: Bad
Yes
No
Price < 109.5
No
No
Price < 147
Yes
No
Age < 49.5
No CompPrice < 152.5
No
Yes
Yes
Yes
No
No
80
●
75
●
75
80
Figure 1: Classification Tree of Carseats: normal. Partitioning the Carseats dataset. If we just enter
the tree object in R, the output will correspond to each of the branch of the tree. It will display the
split criterion from the top to the bottom, number of observations in each of the branch, deviance
and the overall prediction of the branch (Yes or No) and the fraction in the branch for each value
of yes/no respectively.
70
70
ShelveLoc: Bad,Medium
|
65
deviance
●
●
Price < 142.5
Price < 142
60
65
●
60
deviance
●
ShelveLoc: Bad
Yes
No
No
●
●
55
●
55
●
●
●
No
●
●
50
●
50
Price < 86.5
●
Advertising < 6.5
●
Yes
5
10
size
15
0
5
10
15
k
20
No
Age < 37.5
CompPrice < 118.5
Yes
No
Yes
(a) Pruning the Tree: Plotting
the error rates against both size
(terminal nodes) and k (cost- (b) Plot of the Classification tree
complexity parameter)
after Pruning: 9 Nodes
Figure 2: To prune the tree, we can use prune.misclass(). In this case, we pruned it down to a 9
node tree by using the command: prune.carseats ¡- prune.misclass(tree.carseats,best=9)
0.32
25067
0.60
Bin 22
9440
0.28
27302
249
32342
9191
-1
Height
0.14
Height
8605
12
23505
0.2
0.12
27639
12384
18480
23530
0.02
Height
0
Height
30625
30624
26874
26395
26081
25592
25472
24852
24831
24754
24753
24180
23608
23587
23585
23241
23106
23050
22912
22592
22537
21437
21362
21299
21293
21234
21118
20594
20589
20442
20102
19803
19787
19685
19684
19105
18551
18451
18288
18229
17996
17824
17804
17785
17438
15893
15770
15689
15058
15003
14924
13437
13090
13077
13035
12882
12392
12300
11730
11103
11102
11011
10988
10724
10360
10187
9476
9475
9267
9221
9099
9090
9059
9020
8557
8104
7106
712
5929
7034
6895
4864
6761
5204
5070
6696
5926
2365
4555
3240
5927
4798
4632
4079
2061
3757
2364
627
624
5922
8630
0.16
0.18
Bin 14
282
0.24
Height
0.50
Height
0.8
27042
Bin 12
27190
0.40
0.4
Height
277
0.14
Bin 4
9433
8625
0.0
0.10
Height
Bin 10
2568
27411
27105
0.06
Bin 1
Bin 17
Bin 25
(a) Random Forest: bagging (b) Random Forest: bagging
with 1, 4, 10, and 12 bins
with 14, 17, 22, and 25 bins
Figure 3: Examples of Random forest for Syria database and bagging after using bootstrap to fit the
data. Bagging, is the general-purpose procedure for reducing the variance of a statistical learning
method. We can build the number of decision trees on the bootstrapped training samples. Each
time a split in a tree is considered, a random sample of m mpredictors is chosen as split candidtates
from the full set of p predictors.