Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Automatic Acquisition of Qualia Structure from Corpus Data by Ichino Yamada, Timothy Baldwin, Nonmembers, Hideki Sumiyoshi, Masahiro Shibata, and Nobuyuki Yagi, Members Presenter: Yu-Yun Chang Graduate Institute of Linguistics, National Taiwan University April 30, 2014 1/26 Introduction Procedures Machine Learning and Benchmark 1 Introduction 2 Procedures 3 Machine Learning and Benchmark 4 Evaluation 5 Results and Conclusion Evaluation Results and Conclusion 2/26 Introduction Procedures Machine Learning and Benchmark 1 Introduction 2 Procedures 3 Machine Learning and Benchmark 4 Evaluation 5 Results and Conclusion Evaluation Results and Conclusion 3/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Introduction motiv It’s a hard work in manually maintaining and updating lexical relations aim automatically discover lexical knowledge from corpus, e.g. Maximum Entropy Learning focus on the telic and agentive roles of nouns cuz formal and constitutive roles have enough attention due to previous studies still not availabe in automatic acquisition and large-scale lexical resources goal To generate a ranked list of verbs for a given noun for each of the telic and agentive roles 4/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Introduction motiv It’s a hard work in manually maintaining and updating lexical relations aim automatically discover lexical knowledge from corpus, e.g. Maximum Entropy Learning focus on the telic and agentive roles of nouns cuz formal and constitutive roles have enough attention due to previous studies still not availabe in automatic acquisition and large-scale lexical resources goal To generate a ranked list of verbs for a given noun for each of the telic and agentive roles 4/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Introduction motiv It’s a hard work in manually maintaining and updating lexical relations aim automatically discover lexical knowledge from corpus, e.g. Maximum Entropy Learning focus on the telic and agentive roles of nouns cuz formal and constitutive roles have enough attention due to previous studies still not availabe in automatic acquisition and large-scale lexical resources goal To generate a ranked list of verbs for a given noun for each of the telic and agentive roles 4/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Introduction motiv It’s a hard work in manually maintaining and updating lexical relations aim automatically discover lexical knowledge from corpus, e.g. Maximum Entropy Learning focus on the telic and agentive roles of nouns cuz formal and constitutive roles have enough attention due to previous studies still not availabe in automatic acquisition and large-scale lexical resources goal To generate a ranked list of verbs for a given noun for each of the telic and agentive roles 4/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Introduction motiv It’s a hard work in manually maintaining and updating lexical relations aim automatically discover lexical knowledge from corpus, e.g. Maximum Entropy Learning focus on the telic and agentive roles of nouns cuz formal and constitutive roles have enough attention due to previous studies still not availabe in automatic acquisition and large-scale lexical resources goal To generate a ranked list of verbs for a given noun for each of the telic and agentive roles 4/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Introduction motiv It’s a hard work in manually maintaining and updating lexical relations aim automatically discover lexical knowledge from corpus, e.g. Maximum Entropy Learning focus on the telic and agentive roles of nouns cuz formal and constitutive roles have enough attention due to previous studies still not availabe in automatic acquisition and large-scale lexical resources goal To generate a ranked list of verbs for a given noun for each of the telic and agentive roles 4/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Qualia Structure from Generative Lexicon by Pustejovsky (1995) captures our understanding of an object or a relation in the world the qualia structure of a lexical item includes four roles: formal role: the basic category of which distinguishes the meaning of a word within a larger domain, e.g. shape, color, magnitude, etc. constitutive role: the internal constitution of the entity, e.g. component elements telic role: the typical function of the entity agentive role: the origin of the entity or its coming into being, e.g. creator, artifact, natural kind, etc. 5/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Qualia Structure An Example ... ps. take a note on telic and agentive roles the qualia structure of book formal role: publication constitutive role: text telic role: read, study, publish (predicates) agentive role: write, study, publish (predicates) application of telic and agentive roles Mary finished her beer. Mary finished drinking her beer. 6/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Qualia Pairs: noun-verb pairs The noun-verb pairs on telic and agentive roles ... Bouillon et al. use inductive logic programming to identify noun-verb pairs from corpus data Cimiano and Wenderoth use POS-tagged web data and Google counts for an quantitative analysis In this study, identify the qualia roles of an arbitrary noun prefer a qualitative analysis for each telic and agentive role ... generate a ranked list of verbs given a noun 7/26 Introduction Procedures Machine Learning and Benchmark 1 Introduction 2 Procedures 3 Machine Learning and Benchmark 4 Evaluation 5 Results and Conclusion Evaluation Results and Conclusion 8/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Procedures 9/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Resources I corpus BNC tokenized sentences (dependency-parsed) use CLAW-2 tagset to tag raw sentences use tag-sequence grammar over word-level tag use RASP to tag relations (23 relations) output: dependency tuples - a head + dependents + a relation ncmod( , ticket NN1, airplane NN2) airplane tickets dobj(read VV0, book NN2, ) read books 10/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Resources II word list 30 nouns * 50 verbs = 1500 noun-verb pairs 30 nouns: 10 from literature + 20 randomly selected use 30 nouns to find all the dependency tuples with the nouns 50 verbs: hand-chosen and randomly selected from the found dependency tuples, and manually tag with telic or agentive roles (gold standard) evaluate gold standard data two native English speakers evaluate the noun-verb pairs for the telic and agentive roles a scale from 0-10 take the mean of evaluation between two annotators 11/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Resources III most of the verbs are regarded as being unrelated to the noun under the given qualia relation based on the variance, the annotators judgments are more clear-cut for the agentive role than the telic role 12/26 Introduction Procedures Machine Learning and Benchmark 1 Introduction 2 Procedures 3 Machine Learning and Benchmark 4 Evaluation 5 Results and Conclusion Evaluation Results and Conclusion 13/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Machine Learning and Benchmark Machine learning: Maximum Entropy Classifier Benchmark: Hand-generated templates 14/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Machine Learning I Maximum Entropy Learning ... categorize the data from gold standard data positive: 7-10 (the noun-verb pair is adequate) negative: 0 (the noun-verb pair is impossible) 29 nouns for training and 1 noun for testing based on the above selected noun-verb pair, extract all the relevant parsed-BNC sentences based on the RASP relations, extract the information all the given noun-verb pairs, noun, verb, etc. organize the above step to generate featuresets 15/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Machine Learning II algorithms: Probability: given the noun-verb pair occurrences, the probability of assigning telic or agentive role Mutual Information: to reduce the effects caused by counting the noun-verb pair occurrences Maximum Entropy: the larger the value, the better the adequacy 16/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Machine Learning Can I have a book to read? Extract the featuresets Probabilities → MI → Maximum Entropy 17/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Hand-Generated Templates I used as a benchmark for Maximum Entropy Classifier all templates assume that the noun will occur as the deep object of a transitive verb telic role: 8 constructional templates agentive role: 1 constructional template 18/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Hand-Generated Templates II count the relative frequency and rank the verbs based on scores 19/26 Introduction Procedures Machine Learning and Benchmark 1 Introduction 2 Procedures 3 Machine Learning and Benchmark 4 Evaluation 5 Results and Conclusion Evaluation Results and Conclusion 20/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Evaluation on Ranked Lists 21/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Evaluation on Ranked Lists Variant of Spearman’s Rank Correlation cuz most verbs could not be construed as fulfilling the telic or agentive roles of a given noun value 0-1 (1 represents highly correlated) 22/26 Introduction Procedures Machine Learning and Benchmark 1 Introduction 2 Procedures 3 Machine Learning and Benchmark 4 Evaluation 5 Results and Conclusion Evaluation Results and Conclusion 23/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Results and Conclusion the gold standard data has high agreement at higher ranks the gold standard has greater variation in interpreting the telic role. in Top-1, templates > maximum entropy generally speaking, maximum entropy > templates 24/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion Results and Conclusion ME Templates agentive 0.605 0.5 telic 0.479 0.337 Gold 0.816 0.659 Maximum Entropy classifier is relatively successful at identifying qualia structure than the traditional template-based approach. 25/26 Introduction Procedures Machine Learning and Benchmark Evaluation Results and Conclusion – End of Presentation – Thank you 26/26
© Copyright 2024 ExpyDoc