PowerPoint プレゼンテーション

ISM seminor
on 17/1/2001
Information Geometry on Classification
– Logistic, AdaBoost, Area under ROC curve –
Shinto Eguchi
This talk is based on one of joint work with Dr J Copas
1
Outline
Problem setting for classification
overview of classification methods
[ http://juban.ism.ac.jp/ ]
Dw classifications
Dw divergence of
discriminant functions
definition from NP Lemma,
expected and ovserved expressions
examples of Dw
logistic regression, adaboost,
area under ROC curve, hit rate,
credit scoring, medical screening
structure of
Dw risk functions
optimal Dw under near-logistic
implement by cross-validation
Risk scores of
skin cancer
area under ROC curve, comparison
discussion on other methods
2
Standard methods
 Fisher linear discriminant analysis [4]
 Logistic regression
[ Cornfield, 1962]
 Multilayer perception
[ http://juban.ism.ac.jp/file_ppt/公開講座(ニューラル).ppt]
New approaches

Boostimg – combining weak learners –
AdaBoost
[http://juban.ism.ac.jp/file_ppt/公開講座(Boost).ppt]
 Support vector machine – VCdimension –
[http://juban.ism.ac.jp/file_ppt/open-svm12-21.ppt]
 Kernel method – Mercer theorem –
[http://juban.ism.ac.jp/file_ppt/主成分発表原稿.ppt]
3
Problem setting
input vector
output variable
Definition
is a classifier if
is onto.
(direct sum)
the k-th decision space
4
Probablistic model
Joint distribution of
where
,y :
prior distribution
conditional distribution of
given y
Misclassification
error rate
hit rate
5
discriminant function
classifier
Bayes rule
Given P(x, y),
Training data (examples)
i-th input
i-th input
6
Reduction of our problem to binary classification
output variable
discriminant function
classifier
error rate
log-likelihood ratio
7
Other loss functions for classification
Credit scoring [5]
A cost model : a profit
if y = 1; loss
if y = 0.
General setting
Let
be a cost of classify y as
.
The expected cost is
8
ROC (Reciever Operating Characteristic) curve
correct rejection
false positive
false negative
hit
9
Main story
Given a training data
linear discriminant function
objective function
proposed estimator
What (U ,V ) is ?
Logistic is OK.
10
A reinterpretation of Neyman-Pearson Lemma
log-likelihood ratio
discriminant function
Proposition
Remark
11
Proof of Proposition
12
Divergence Dw of discriminant function
Def.
Expectation expression
13
Proof
14
Sample expression given a set of training data
Minimum Dw method
for a statistical model F
15
Examples of Dw divergence
(1) logistic regression
(2) Hit rate, Credit scoring, medical screening
16
(3) Area under ROC curve
(4) AdaBoost
This Dw is the loss function of AdaBoost, cf. [7], [8].
17
Structure of
Dw risk functions
optimal Dw under near-logistic
implement by cross-validation
Logistic(linear)-parametric model
model distribution of
,y :
18
Estimating equation of minimum Dw methods
Remark
19
Prametric assumption
Cauchy-Schwartz inequality
20
Near-Parametric assumption
21
Our risk function of an estimator
is
But our situation is
Let
the bias term is
where
variance term is
Cross + varianced Risk estimate
where
is the estimate from the training date by leaving the
i th-example out.
22
23
Outlier
For
24
Note :
where
25
26
27
28
29
30
31
32
33
References
[1] Begg, C. B., Satogopan, J. M. and Berwick, M. (1998). A new strategy for evaluating the
impact of epidemiologic risk factors for cancerwith applications to melanoma.
J. Amer. Statist. Assoc. 93, 415-426.
[2] Berwick, M, Begg, C. B., Fine, J. A., Roush, G. C. and Barnhill, R. L. (1996). Screening for
cutaneous melanoma by self skin examination. J. National Cancer Inst., 88, 17-23.
[3] Eguchi, S and Copas, J. (2000). A Class of Logistic-type Discriminant Functions.
Technical Report of Department of Statistics, University of Warwick.
[4] Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of
Eugenics, 7, 179-188.
[5] Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit
scoring: a review. J. Roy. Statist. Soc., A, 160, 523-541.
[6] McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley:
New York.
[7] Schapire R., Freund, Y., Bartlett, P. and Lee, W. S. (1998) Boosting the margin: a new
explanation for the effectiveness of voting methods. Ann. Statist. 26, 1651-1686.
[8] Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Springer: New York. 34