Yuyun

Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Automatic Acquisition of Qualia Structure
from Corpus Data
by Ichino Yamada, Timothy Baldwin, Nonmembers, Hideki
Sumiyoshi, Masahiro Shibata, and Nobuyuki Yagi, Members
Presenter: Yu-Yun Chang
Graduate Institute of Linguistics,
National Taiwan University
April 30, 2014
1/26
Introduction
Procedures
Machine Learning and Benchmark
1
Introduction
2
Procedures
3
Machine Learning and Benchmark
4
Evaluation
5
Results and Conclusion
Evaluation
Results and Conclusion
2/26
Introduction
Procedures
Machine Learning and Benchmark
1
Introduction
2
Procedures
3
Machine Learning and Benchmark
4
Evaluation
5
Results and Conclusion
Evaluation
Results and Conclusion
3/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Introduction
motiv It’s a hard work in manually maintaining and updating
lexical relations
aim
automatically discover lexical knowledge from corpus,
e.g. Maximum Entropy Learning
focus on the telic and agentive roles of nouns
cuz formal and constitutive roles have enough attention
due to previous studies
still not availabe in automatic acquisition and large-scale
lexical resources
goal To generate a ranked list of verbs for a given noun for
each of the telic and agentive roles
4/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Introduction
motiv It’s a hard work in manually maintaining and updating
lexical relations
aim
automatically discover lexical knowledge from corpus,
e.g. Maximum Entropy Learning
focus on the telic and agentive roles of nouns
cuz formal and constitutive roles have enough attention
due to previous studies
still not availabe in automatic acquisition and large-scale
lexical resources
goal To generate a ranked list of verbs for a given noun for
each of the telic and agentive roles
4/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Introduction
motiv It’s a hard work in manually maintaining and updating
lexical relations
aim
automatically discover lexical knowledge from corpus,
e.g. Maximum Entropy Learning
focus on the telic and agentive roles of nouns
cuz formal and constitutive roles have enough attention
due to previous studies
still not availabe in automatic acquisition and large-scale
lexical resources
goal To generate a ranked list of verbs for a given noun for
each of the telic and agentive roles
4/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Introduction
motiv It’s a hard work in manually maintaining and updating
lexical relations
aim
automatically discover lexical knowledge from corpus,
e.g. Maximum Entropy Learning
focus on the telic and agentive roles of nouns
cuz formal and constitutive roles have enough attention
due to previous studies
still not availabe in automatic acquisition and large-scale
lexical resources
goal To generate a ranked list of verbs for a given noun for
each of the telic and agentive roles
4/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Introduction
motiv It’s a hard work in manually maintaining and updating
lexical relations
aim
automatically discover lexical knowledge from corpus,
e.g. Maximum Entropy Learning
focus on the telic and agentive roles of nouns
cuz formal and constitutive roles have enough attention
due to previous studies
still not availabe in automatic acquisition and large-scale
lexical resources
goal To generate a ranked list of verbs for a given noun for
each of the telic and agentive roles
4/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Introduction
motiv It’s a hard work in manually maintaining and updating
lexical relations
aim
automatically discover lexical knowledge from corpus,
e.g. Maximum Entropy Learning
focus on the telic and agentive roles of nouns
cuz formal and constitutive roles have enough attention
due to previous studies
still not availabe in automatic acquisition and large-scale
lexical resources
goal To generate a ranked list of verbs for a given noun for
each of the telic and agentive roles
4/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Qualia Structure
from Generative Lexicon by Pustejovsky (1995)
captures our understanding of an object or a relation in
the world
the qualia structure of a lexical item includes four roles:
formal role: the basic category of which distinguishes the
meaning of a word within a larger domain, e.g. shape,
color, magnitude, etc.
constitutive role: the internal constitution of the entity,
e.g. component elements
telic role: the typical function of the entity
agentive role: the origin of the entity or its coming into
being, e.g. creator, artifact, natural kind, etc.
5/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Qualia Structure
An Example ...
ps. take a note on telic and agentive roles
the qualia structure of book
formal role: publication
constitutive role: text
telic role: read, study, publish (predicates)
agentive role: write, study, publish (predicates)
application of telic and agentive roles
Mary finished her beer.
Mary finished drinking her beer.
6/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Qualia Pairs: noun-verb pairs
The noun-verb pairs on telic and agentive roles ...
Bouillon et al. use inductive logic programming to
identify noun-verb pairs from corpus data
Cimiano and Wenderoth use POS-tagged web data and
Google counts for an quantitative analysis
In this study,
identify the qualia roles of an arbitrary noun
prefer a qualitative analysis
for each telic and agentive role ...
generate a ranked list of verbs given a noun
7/26
Introduction
Procedures
Machine Learning and Benchmark
1
Introduction
2
Procedures
3
Machine Learning and Benchmark
4
Evaluation
5
Results and Conclusion
Evaluation
Results and Conclusion
8/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Procedures
9/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Resources I
corpus BNC tokenized sentences (dependency-parsed)
use CLAW-2 tagset to tag raw sentences
use tag-sequence grammar over word-level tag
use RASP to tag relations (23 relations)
output: dependency tuples - a head + dependents + a
relation
ncmod( , ticket NN1, airplane NN2) airplane tickets
dobj(read VV0, book NN2, ) read books
10/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Resources II
word list 30 nouns * 50 verbs = 1500 noun-verb pairs
30 nouns: 10 from literature + 20 randomly selected
use 30 nouns to find all the dependency tuples with the
nouns
50 verbs: hand-chosen and randomly selected from the
found dependency tuples, and manually tag with telic or
agentive roles (gold standard)
evaluate gold standard data
two native English speakers
evaluate the noun-verb pairs for the telic and agentive
roles
a scale from 0-10
take the mean of evaluation between two annotators
11/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Resources III
most of the verbs are regarded as being unrelated to the
noun under the given qualia relation
based on the variance, the annotators judgments are
more clear-cut for the agentive role than the telic role
12/26
Introduction
Procedures
Machine Learning and Benchmark
1
Introduction
2
Procedures
3
Machine Learning and Benchmark
4
Evaluation
5
Results and Conclusion
Evaluation
Results and Conclusion
13/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Machine Learning and Benchmark
Machine learning: Maximum Entropy Classifier
Benchmark: Hand-generated templates
14/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Machine Learning I
Maximum Entropy Learning ...
categorize the data from gold standard data
positive: 7-10 (the noun-verb pair is adequate)
negative: 0 (the noun-verb pair is impossible)
29 nouns for training and 1 noun for testing
based on the above selected noun-verb pair, extract all
the relevant parsed-BNC sentences
based on the RASP relations, extract the information all
the given noun-verb pairs, noun, verb, etc.
organize the above step to generate featuresets
15/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Machine Learning II
algorithms:
Probability: given the noun-verb pair occurrences, the
probability of assigning telic or agentive role
Mutual Information: to reduce the effects caused by
counting the noun-verb pair occurrences
Maximum Entropy: the larger the value, the better the
adequacy
16/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Machine Learning
Can I have a book to read?
Extract the featuresets
Probabilities → MI → Maximum Entropy
17/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Hand-Generated Templates I
used as a benchmark for Maximum Entropy Classifier
all templates assume that the noun will occur as the deep
object of a transitive verb
telic role: 8 constructional templates
agentive role: 1 constructional template
18/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Hand-Generated Templates II
count the relative frequency and rank the verbs based on
scores
19/26
Introduction
Procedures
Machine Learning and Benchmark
1
Introduction
2
Procedures
3
Machine Learning and Benchmark
4
Evaluation
5
Results and Conclusion
Evaluation
Results and Conclusion
20/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Evaluation on Ranked Lists
21/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Evaluation on Ranked Lists
Variant of Spearman’s Rank Correlation
cuz most verbs could not be construed as fulfilling the
telic or agentive roles of a given noun
value 0-1 (1 represents highly correlated)
22/26
Introduction
Procedures
Machine Learning and Benchmark
1
Introduction
2
Procedures
3
Machine Learning and Benchmark
4
Evaluation
5
Results and Conclusion
Evaluation
Results and Conclusion
23/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Results and Conclusion
the gold standard data has high agreement at higher ranks
the gold standard has greater variation in interpreting the
telic role.
in Top-1, templates > maximum entropy
generally speaking, maximum entropy > templates
24/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
Results and Conclusion
ME Templates
agentive 0.605
0.5
telic
0.479
0.337
Gold
0.816
0.659
Maximum Entropy classifier is relatively successful at
identifying qualia structure than the traditional
template-based approach.
25/26
Introduction
Procedures
Machine Learning and Benchmark
Evaluation
Results and Conclusion
– End of Presentation –
Thank you
26/26