Japanese Dependency Analysis using Cascaded Chunking Taku Kudo 工藤 拓 Yuji Matsumoto 松本 裕治 Nara Institute Science and Technology, JAPAN Motivation Kudo, Matsumoto 2000 (VLC) Presented a state-of-the-art Japanese dependency parser using SVMs (89.09% for standard dataset) Could show the high generalization performance and feature selection abilities of SVMs Problems Not scalable • 2 weeks training using 7,958 sentences • Hard to train with larger data Slow in Parsing • 2 ~ 3 sec./sentence • Too slow to use it for actual NL applications Goal Improve the scalability and the parsing efficiency without loosing accuracy ! How? Apply Cascaded Chunking model to dependency parsing and the selection of training examples Reduce the number of times SVMs are consulted in parsing Reduce the number of negative examples learned Outline Japanese dependency analysis Two models Probabilistic model (previous) Cascaded Chunking model (new!) Features used for training and classification Experiments and results Conclusion and future work Japanese Dependency Analysis (1/2) Analysis of relationship between phrasal units called bunsetsu (segments), base phrases in English Two Constraints Each segment modifies one of the right-side segments (Japanese is head final language) Dependencies do not cross each other Japanese Dependency Analysis (2/2) Raw text 私は彼女と京都に行きます I go to Kyoto with her. Morphological analysis and Bunsetsu identification 私は / 彼女と / 京都に / 行きます I with her to Kyoto-loc go Dependency Analysis 私は / 彼女と / 京都に / 行きます Probabilistic Model Input 私は 1 / 彼女と 2 / 京都に 3 / 行きます I-top / with her / to Kyoto-loc / go 1. Build a Dependency Matrix ME, DT or SVMs (How probable one segment modifies another) 4 Modifiee 2. Search the optimal dependencies which maximize the sentence probabilities using CYK or Chart 2 Modifier Output 3 4 1 0.1 0.2 0.7 2 0.2 0.8 3 1.0 Dependency Matrix 私は 1 / 彼女と 2 / 京都に 3 / 行きます 4 Problems of Probabilistic model(1/2) Selection of training examples: All candidates of two segments which have Dependency relation→positive No dependency relation→negative This straightforward way of selection requires a total n(n 1) / 2 (where n is # of segments in a sentence) training examples per sentence Difficult to combine probabilistic model with SVMs which require polynomial computational cost Problems of Probabilistic model(2/2) 3 O(n ) parsing time is necessary with CYK or Chart Even if beam-search is applied, 2 parsing O(n ) time is always necessary The classification cost of SVMs is much more expensive than other ML algorithms such as ME and DT Cascaded Chunking Model English parsing [Abney 1991] Parses a sentence deterministically only deciding whether the current segment modifies the segment on its immediate right hand side Training examples are extracted using this algorithm itself Example: Training Phase Annotated sentence 彼は1 He 彼女の2 her 温かい3 warm 真心に4 感動した。5 heart be moved (He was moved by her warm heart.) ?? finish 彼は 1 1 彼は 彼は 11 D O O 彼女の 彼女の22 O D Pairs of tag (D or O) and context(features) are stored as training data for SVMs ? ?? ? ? ??? 温かい3 真心に44 感動した。 感動した。55 O D D O O Training Data SVMs Tag is decided by annotated corpus Example: Test Phase Test sentence 彼は1 He 彼女の2 her 温かい3 warm 真心に4 感動した。5 heart be moved (He was moved by her warm heart.) ?? finish 彼は 1 11 彼は 彼は 1 1 D O O 彼女の22 O D Tag is decided by SVMs built in training phase ? ?? ? ? ??? 温かい3 真心に44 感動した。 感動した。5555 O D O D SVMs Advantages of Cascaded Chunking model Simple and Efficient 3 Prob.: O(n ) v.s. cascaded chunking: O(n2 ) 2 O ( n ) since most of segments Lower than modify segment on its immediate righthand-side Training examples is much smaller Independent from ML algorithm Can be combined with any ML algorithms which work as a binary classifier Probabilities of dependency are not necessary Features Modify or not? B 彼の1 友人は2 His friend-top A この本を3 C 持っている4 女性を5 探している6 this book-acc have modifier lady-acc be looking for modifiee His friend is looking for a lady who has this book. Static Features modifier/modifiee Head/Functional Word: (surface,POS,POS-subcategory,inflectiontype,inflection-form), brackets, quotations, punctuations, position Between segments: quotations, punctuations distance, case-particles, brackets, Dynamic Features [Kudo, Matsumoto 2000] A,B : Static features of Functional word C: Static features of Head word Experimental Setting Kyoto University Corpus 2.0/3.0 Standard Data Set • Training: 7,958 sentences / Test: 1,246 sentences • Same data as [Uchimoto et al. 98, Kudo, Matsumoto 00] Large Data Set • 2-fold Cross-Validation using all 38,383 sentences Kernel Function: 3rd polynomial Evaluation method Dependency accuracy Sentence accuracy Results Data Set Standard Model Cascaded Chunking Large Probabilistic Cascaded Chunking Probabilistic Dependency Acc. (%) 89.29 89.09 90.04 N/A Sentence Acc. (%) 47.53 46.17 53.16 N/A # of training sentences 7,956 7,956 19,191 19,191 # of training examples 110,355 459,105 251,254 1,074,316 Training time (hours) 8 336 48 N/A Parsing time (sec./sent.) 0.5 2.1 0.7 N/A Effect of Dynamic Features(1/2) Effect of Dynamic Features (2/2) Modify or not? B 彼の1 友人は2 His Friend-top modifier Deleted type of dynamic features A この本を3 C 持っている4 女性を5 探している6 this book-acc have lady-acc be looking for modifiee Difference from the model with all dynamic features Dependency Acc. Sentence Acc. A -0.28 % -0.89 % B -0.10% -0.89 % C -0.28 % -0.56 % AB -0.33 % -1.21 % AC -0.55 % -0.97 % BC -0.54 % -1.61 % ABC -0.58 % -2.34 % Probabilistic v.s. Cascaded Chunking (1/2) Probabilistic Model uses all candidates of dependency relation as training data 彼は1 この本を2 He-top this book-acc modifier 持っている3 女性を4 have lady-acc 探している5 be looking for modifiee (He is looking for a lady who has this book.) Positive: Negative: この本を2 → 持っている3 この本を2 → 探している5 unnecessary Probabilistic models commit a number of unnecessary examples Probabilistic v.s. Cascaded Chunking (2/2) Probabilistic Cascaded Chunking Strategy Maximize sentence probability Shift-Reduce Deterministic Merit Can see all candidates of dependency Simple, efficient and scalable Accurate as Prob. model Demerit Not efficient, Commit to unnecessary training examples Cannot see the all (posterior) candidates of dependency Conclusion A new Japanese dependency parser using a cascaded chunking model It outperforms the previous probabilistic model with respect to accuracy, efficiency and scalability Dynamic features significantly contribute to improve the performance Future Work Coordinate structure analysis Coordinate structures frequently appear in Japanese long sentences and make analysis hard Use posterior context Hard to parse the following sentence only using cascaded chunking model 僕の 母の ダイヤの My mother’s diamond 指輪 ring Comparison with Related Work Model Training Corpus sentences) Acc. (%) Cascaded Chunking + SVMs Kyoto Univ. (19,191) 90.46 Kyoto Univ. (7,956) 89.29 Kudo et al. 00 Prob. + SVMs Kyoto Univ. (7,956) 89.09 Uchimoto et al. 00 Prob. + ME Kyoto Univ. (7,956) 87.93 Kanayama et al. 00 Prob. + ME + HPSG EDR (192,778) 88.55 Haruno et al. 98 Prob. + DT + Boosting EDR (50,000) 85.03 Fujio et al. 98 EDR (190,000) 86.67 Our Model Prob. + ML (# of Support Vector Machines [Vapnik] w x b 1 w x b 1 yi 1 d d yi 1 d d wx b 0 Maximize the margin d d | w xi b 1 | | w xi b 1 | 2 || w || || w || || w || Min.: s.t.: L(w) || w || / 2 yi [(w xi ) b] 1 2 Soft Margin Kernel Function
© Copyright 2024 ExpyDoc