Learning-Based Argument Structure Analysis of Event-Nouns in Japanese Mamoru Komachi, Ryu Iida, Kentaro Inui and Yuji Matsumoto Graduate School of Information Science Nara Institute of Science and Technology, JAPAN 19 September 2007 Our goal Our city, destroyed by the atomic bomb Our city was destroyed by the atomic bomb The atomic bomb destroyed our city the destruction of our city by the atomic bomb destroy Nominalization CAUSE The atomic bomb UNDERGOER Our city IE, MT, Summarization, … 2 Argument structure of event-nouns Kanojo-kara She-ABL denwa-ga phone-NOM ki-ta come-PAST (She phoned me.) come NOM phone phone ABL she NOM DAT she (me) Logical cases for event-nouns are often not marked by case markers 3 Task setting Tom-ga kinou denwa-o ka-tta Tom-NOM yesterday phone-ACC buy-PAST (Tom bought a phone yesterday.) phone buy NOM ACC Tom phone ? ? 1. Event classification (determine event-hood) 2. Argument identification 4 Outline Introduction Argument structure analysis of event-nouns Event classification Argument identification Conclusion Future work 5 Unsupervised learning of patterns … persuasion destruction … Positive … conducted destruction of documents … Depends Verb Encode Having eventhood Same phrase Common noun each instance in a flat tree Using surface text, POS, dependency relations, etc. … chair desk … Not having eventhood … a little chair around… Negative Same phrase Adj Prep Encode an instance in a tree and learn contextual patterns as sub-trees by Boosting algorithm called BACT (Kudo and Matsumoto, 2004) 6 Experiments of event classification Method: Classify eventhood of event-nouns by Support Vector Machines Data: 80 news articles (800 sentences) 1,237 event-nouns (590 have eventhood) Features: Grammatical features HeadPOS: CommonNoun Semantic features SemanticCategory: Animate Contextual features FollowsVerbalNoun: 1 7 Results of event classification Prec. Baseline (predominant) 60.4 Proposed (unsupervised) 73.3 Rec. 88.2 80.2 F 71.7 76.6 Baseline: use the first sense determined by corpus statistics (NAIST Text Corpus) Proposed: machine learning based classifier Precision = correct / event-nouns which are classified as having event-hood by system Recall: correct / all event-nouns in the corpus Outperform in precision and F by using contextual patterns Can improve more by adding more data 8 Outline Introduction Argument structure analysis of event-nouns Event classification Argument identification Conclusion Future work 9 Argument identification Build a classifier using tournament model (Iida et al., 2006) R:政府 日本 Japanese 政府 による government-BY L:政府 民間 L:民間 支援 が private sector support-NOM 支援(する) NOM 活性 化 した。 activate -PAST The support for the private sector by the Japanese government was activated. 日本,政府 R 民間,活性 L:民間 政府,民間 L 政府,民間 L:政府 政府,活性 training L 日本,政府 R:政府 decoding 10 Calculation of PMI using pLSI Estimate point-wise mutual information using Probabilistic Latent Semantic Indexing (Hoffman, 1999) where noun n depends on verb v through case marker c (Fujita et al., 2004) Dimension reduction by a hidden class z P( v,c,n ) P( v,c | z)P(n | z)P(z) zZ … pay for the shoes <pay,for> shoes P( v,c,n ) PMI( v,c ,n) log P( v,c )P(n) 11 Case alignment dictionary In NomBank, 20% of the arguments that occur outside NP are in support verb construction (Jiang and Ng, 2006) kare-ga he-NOM Case alternation kanojo-ni her-DAT benkyo-o study-ACC oshie-ta teach-PAST (He taught a lesson to her.) kanojo-ga her-NOM benkyo-sita study-PAST (She studied.) (ACCevent, oshie-ru) = DATpred→NOMevent (teach) 12 Experiments of argument identification Method: Apply the Japanese zeroanaphora resolution model (Iida et al., 2006) to the argument identification task Both tasks lack case marker Event classification = anaphoricity determination task Data: 137 articles for training and 150 articles for testing (event-nouns: 722, NOM: 722, ACC: 278, DAT: 72) 13 Features 日本 政府 による Japanese government-BY 民間 支援 が private sector support-NOM 活性化 した。 activate-PAST The support for the private sector by the Japanese government was activated. Feature Lexical Grammatical Semantic Positional Example WordForm POS CoocScore (PMI) Instance 日本 ProperNoun <支援(する),ガ>, 日本→2.80 NPDependsOnSupp 0 ortVerb (日本政府による) 14 Accuracy of argument identification Feature NOM ACC DAT Baseline 60.5 79.7 73.0 +SVC 64.2 78.0 71.4 +COOC 67.1 80.1 74.6 +SVC +COOC 68.3 80.1 74.6 Case alignment dictionary and cooccurrence statistics improved accuracy SVC: support verb construction; COOC: co-occurrence 15 Related work Jiang and Ng (2006) Built maxent classifier for the NomBank (Meyers et al., 2004) based on features for PropBank (Palmer et al., 2005) Xue (2006) Used Chinese TB Liu and Ng (2007) Applied Alternating Structure Optimization (ASO) to the task of argument identification 16 Conclusion Defined argument structure analysis of event-nouns in Japanese Proposed an unsupervised approach to learn contextual patterns of event-nouns to the event classification task Performed argument identification using co-occurrence statistics and syntactic clues 17 Future work Explore semi-supervised approach to the event classification task Use more lexical resources to the argument identification task 18
© Copyright 2024 ExpyDoc