Learning-Based Argument Structure Analysis of

Learning-Based Argument Structure
Analysis of Event-Nouns in Japanese
Mamoru Komachi, Ryu Iida, Kentaro Inui and Yuji Matsumoto
Graduate School of Information Science
Nara Institute of Science and Technology, JAPAN
19 September 2007
Our goal
 Our city, destroyed by the atomic bomb
 Our city was destroyed by the atomic bomb
 The atomic bomb destroyed our city
 the destruction of our city by the atomic bomb
destroy
Nominalization
CAUSE
The atomic bomb
UNDERGOER
Our city
IE, MT, Summarization, …
2
Argument structure of event-nouns
Kanojo-kara
She-ABL
denwa-ga
phone-NOM
ki-ta
come-PAST
(She phoned me.)
come
NOM
phone
phone
ABL
she
NOM
DAT
she
(me)
Logical cases for event-nouns are often
not marked by case markers
3
Task setting
Tom-ga
kinou
denwa-o
ka-tta
Tom-NOM yesterday phone-ACC buy-PAST
(Tom bought a phone yesterday.)
phone
buy
NOM
ACC
Tom
phone
?
?
1. Event classification (determine event-hood)
2. Argument identification
4
Outline
 Introduction
 Argument structure analysis of event-nouns
Event classification
Argument identification
 Conclusion
 Future work
5
Unsupervised learning of patterns
…
persuasion
destruction
…
Positive
… conducted destruction
of documents …
Depends
Verb
Encode
Having eventhood
Same
phrase
Common
noun
each instance in a flat tree
Using surface text, POS, dependency relations, etc.
…
chair
desk
…
Not having eventhood
… a little chair around…
Negative
Same
phrase
Adj
Prep
 Encode an instance in a tree and learn contextual
patterns as sub-trees by Boosting algorithm called BACT
(Kudo and Matsumoto, 2004)
6
Experiments of event classification
 Method: Classify eventhood of event-nouns by
Support Vector Machines
 Data: 80 news articles (800 sentences)
 1,237 event-nouns (590 have eventhood)
 Features:
Grammatical features
 HeadPOS: CommonNoun
Semantic features
 SemanticCategory: Animate
Contextual features
 FollowsVerbalNoun: 1
7
Results of event classification
Prec.
Baseline (predominant)
60.4
Proposed (unsupervised) 73.3
Rec.
88.2
80.2
F
71.7
76.6
 Baseline: use the first sense determined by corpus
statistics (NAIST Text Corpus)
 Proposed: machine learning based classifier
 Precision = correct / event-nouns which are classified
as having event-hood by system
 Recall: correct / all event-nouns in the corpus
Outperform in precision and F by using contextual patterns
Can improve more by adding more data
8
Outline
 Introduction
 Argument structure analysis of event-nouns
Event classification
Argument identification
 Conclusion
 Future work
9
Argument identification
Build a classifier using tournament model
(Iida et al., 2006)
R:政府
日本
Japanese
政府 による
government-BY
L:政府
民間
L:民間
支援 が
private sector support-NOM
支援(する)
NOM
活性 化 した。
activate
-PAST
The support for the private sector by the Japanese government was activated.
日本,政府
R
民間,活性
L:民間
政府,民間
L
政府,民間
L:政府
政府,活性
training
L
日本,政府
R:政府
decoding
10
Calculation of PMI using pLSI
Estimate point-wise mutual information
using Probabilistic Latent Semantic
Indexing (Hoffman, 1999) where noun n
depends on verb v through case marker c
(Fujita et al., 2004) Dimension reduction by a hidden class z
P( v,c,n )   P( v,c | z)P(n | z)P(z)
zZ
… pay for the shoes

<pay,for>
shoes
P( v,c,n )
PMI( v,c ,n)  log
P( v,c )P(n)
11
Case alignment dictionary
In NomBank, 20% of the arguments that occur outside
NP are in support verb construction (Jiang and Ng, 2006)
kare-ga
he-NOM
Case
alternation
kanojo-ni
her-DAT
benkyo-o
study-ACC
oshie-ta
teach-PAST
(He taught a lesson to her.)
kanojo-ga
her-NOM
benkyo-sita
study-PAST
(She studied.)
(ACCevent, oshie-ru) = DATpred→NOMevent
(teach)
12
Experiments of argument identification
Method: Apply the Japanese zeroanaphora resolution model (Iida et al.,
2006) to the argument identification task
Both tasks lack case marker
Event classification = anaphoricity
determination task
Data: 137 articles for training and 150
articles for testing (event-nouns: 722,
NOM: 722, ACC: 278, DAT: 72)
13
Features
日本 政府 による
Japanese government-BY
民間
支援 が
private sector support-NOM
活性化 した。
activate-PAST
The support for the private sector by the Japanese government was activated.
Feature
Lexical
Grammatical
Semantic
Positional
Example
WordForm
POS
CoocScore (PMI)
Instance
日本
ProperNoun
<支援(する),ガ>,
日本→2.80
NPDependsOnSupp 0
ortVerb
(日本政府による)
14
Accuracy of argument identification
Feature
NOM
ACC
DAT
Baseline 60.5
79.7
73.0
+SVC
64.2
78.0
71.4
+COOC
67.1
80.1
74.6
+SVC
+COOC
68.3
80.1
74.6
Case alignment dictionary and cooccurrence statistics improved accuracy
SVC: support verb construction; COOC: co-occurrence
15
Related work
Jiang and Ng (2006)
Built maxent classifier for the NomBank
(Meyers et al., 2004) based on features for
PropBank (Palmer et al., 2005)
Xue (2006)
Used Chinese TB
Liu and Ng (2007)
Applied Alternating Structure Optimization
(ASO) to the task of argument identification
16
Conclusion
Defined argument structure analysis of
event-nouns in Japanese
Proposed an unsupervised approach to
learn contextual patterns of event-nouns to
the event classification task
Performed argument identification using
co-occurrence statistics and syntactic
clues
17
Future work
Explore semi-supervised approach to the
event classification task
Use more lexical resources to the
argument identification task
18