ISM seminor on 17/1/2001 Information Geometry on Classification – Logistic, AdaBoost, Area under ROC curve – Shinto Eguchi This talk is based on one of joint work with Dr J Copas 1 Outline Problem setting for classification overview of classification methods [ http://juban.ism.ac.jp/ ] Dw classifications Dw divergence of discriminant functions definition from NP Lemma, expected and ovserved expressions examples of Dw logistic regression, adaboost, area under ROC curve, hit rate, credit scoring, medical screening structure of Dw risk functions optimal Dw under near-logistic implement by cross-validation Risk scores of skin cancer area under ROC curve, comparison discussion on other methods 2 Standard methods Fisher linear discriminant analysis [4] Logistic regression [ Cornfield, 1962] Multilayer perception [ http://juban.ism.ac.jp/file_ppt/公開講座(ニューラル).ppt] New approaches Boostimg – combining weak learners – AdaBoost [http://juban.ism.ac.jp/file_ppt/公開講座(Boost).ppt] Support vector machine – VCdimension – [http://juban.ism.ac.jp/file_ppt/open-svm12-21.ppt] Kernel method – Mercer theorem – [http://juban.ism.ac.jp/file_ppt/主成分発表原稿.ppt] 3 Problem setting input vector output variable Definition is a classifier if is onto. (direct sum) the k-th decision space 4 Probablistic model Joint distribution of where ,y : prior distribution conditional distribution of given y Misclassification error rate hit rate 5 discriminant function classifier Bayes rule Given P(x, y), Training data (examples) i-th input i-th input 6 Reduction of our problem to binary classification output variable discriminant function classifier error rate log-likelihood ratio 7 Other loss functions for classification Credit scoring [5] A cost model : a profit if y = 1; loss if y = 0. General setting Let be a cost of classify y as . The expected cost is 8 ROC (Reciever Operating Characteristic) curve correct rejection false positive false negative hit 9 Main story Given a training data linear discriminant function objective function proposed estimator What (U ,V ) is ? Logistic is OK. 10 A reinterpretation of Neyman-Pearson Lemma log-likelihood ratio discriminant function Proposition Remark 11 Proof of Proposition 12 Divergence Dw of discriminant function Def. Expectation expression 13 Proof 14 Sample expression given a set of training data Minimum Dw method for a statistical model F 15 Examples of Dw divergence (1) logistic regression (2) Hit rate, Credit scoring, medical screening 16 (3) Area under ROC curve (4) AdaBoost This Dw is the loss function of AdaBoost, cf. [7], [8]. 17 Structure of Dw risk functions optimal Dw under near-logistic implement by cross-validation Logistic(linear)-parametric model model distribution of ,y : 18 Estimating equation of minimum Dw methods Remark 19 Prametric assumption Cauchy-Schwartz inequality 20 Near-Parametric assumption 21 Our risk function of an estimator is But our situation is Let the bias term is where variance term is Cross + varianced Risk estimate where is the estimate from the training date by leaving the i th-example out. 22 23 Outlier For 24 Note : where 25 26 27 28 29 30 31 32 33 References [1] Begg, C. B., Satogopan, J. M. and Berwick, M. (1998). A new strategy for evaluating the impact of epidemiologic risk factors for cancerwith applications to melanoma. J. Amer. Statist. Assoc. 93, 415-426. [2] Berwick, M, Begg, C. B., Fine, J. A., Roush, G. C. and Barnhill, R. L. (1996). Screening for cutaneous melanoma by self skin examination. J. National Cancer Inst., 88, 17-23. [3] Eguchi, S and Copas, J. (2000). A Class of Logistic-type Discriminant Functions. Technical Report of Department of Statistics, University of Warwick. [4] Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179-188. [5] Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. J. Roy. Statist. Soc., A, 160, 523-541. [6] McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley: New York. [7] Schapire R., Freund, Y., Bartlett, P. and Lee, W. S. (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26, 1651-1686. [8] Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Springer: New York. 34
© Copyright 2024 ExpyDoc