Exploiting Syntactic Patterns as Clues in

Exploiting Syntactic
Patterns as Clues in ZeroAnaphora Resolution
Ryu Iida, Kentaro Inui and Yuji Matsumoto
Nara Institute of Science and Technology
{ryu-i,inui,matsu}@is.naist.jp
June, 20th, 2006
1
Zero-anaphora resolution



Zero-anaphor = a gap with an anaphoric function
Zero-anaphora resolution becoming important in many
applications
In Japanese, even obligatory arguments of a predicate
are often omitted when they are inferable from the
context

45.5% nominative arguments of verbs are omitted in
newspaper articles
2
Zero-anaphora resolution (cont’d)

Three sub-tasks:



Zero-pronoun detection: detect a zero-pronoun
Antecedent identification: identify the antecedent for a given
zero-pronoun
Anaphoricity determination:
antecedent
Mary-wa
John-ni
anaphoric zero-pronoun
(φ-ga ) tabako-o
Mary-NOM John-DAT (φ-NOM ) smoking-OBJ
[Mary asked John to quit smoking.]
yameru-youni it-ta
quit-COMP
say-PAST
3
Zero-anaphora resolution (cont’d)

Three sub-tasks:



Zero-pronoun detection: detect a zero-pronoun
Antecedent identification: identify antecedent from the set of candidate
antecedents for a given zero-pronoun
Anaphoricity determination: classify whether a given zero-pronoun is
anaphoric or non-anaphoric
antecedent
Mary-wa
John-ni
anaphoric zero-pronoun
(φ-ga ) tabako-o
Mary-NOM John-DAT (φ-NOM ) smoking-OBJ
[Mary asked John to quit smoking.]
yameru-youni it-ta
quit-COMP
say-PAST
non-anaphoric zero-pronoun
(φ-ga)
ie-ni
(φ -NOM)
home-DAT
kaeri-tai
want to go back
[(φ=I) want to go home.]
4
Previous work on anaphora resolution

Research trend has been shifting from rule-based approaches
(Baldwin, 95; Lappin and Leass, 94; Mitkov, 97, etc.) to empirical, or
learning-based, approaches (Soon et al., 2001; Ng 04, Yang et al., 05,
etc.)
 Cost-efficient solution for achieving performance comparable to best
performing rule-based systems

Learning-based approaches represent a problem, anaphoricity
determination and antecedent identification, as a set of feature
vectors and apply machine learning algorithms to them
5
Syntactic pattern features

Useful clues for both anaphoricity determination and
antecedent identification
Mary-wa
Mary-TOP
Antecedent
John-ni
John-DAT
zero-pronoun
φ-ga
φ-NOM
tabako-o
smoking-OBJ
predicate
yameru-youni
quit-CONP
predicate
it-ta
say-PAST
6
Syntactic pattern features

Useful clues for both anaphoricity determination and
antecedent identification
Mary-wa
Mary-TOP

Antecedent
John-ni
John-DAT
zero-pronoun
φ-ga
φ-NOM
tabako-o
smoking-OBJ
predicate
yameru-youni
quit-CONP
predicate
it-ta
say-PAST
Questions


How to encode syntactic patterns as features
How to avoid data sparseness problem
7
Talk outline
1.
2.
3.
4.
5.
Zero-anaphora resolution: Background
Selection-then-classification model (Iida et al., 05)
Proposed model

Represents syntactic patterns based on dependency
trees

Uses a tree mining technique to seek useful sub-trees
to solve data sparseness problem

Incorporates syntactic pattern features in the
selection-then-classification model
Experiments on Japanese zero-anaphora
Conclusion and future work
8
Selection-then-Classification Model
(SCM) (Iida et al., 05)
A federal judge in Pittsburgh issued a temporary restraining order preventing Trans
World Airlines from buying additional shares of USAir Group Inc. The order,
requested in a suit filed by USAir, …
candidate anaphor
federal judge
order
…
candidate
antecedents
tournament model
USAir Group Inc
candidate
anaphor
suit
USAir
9
Selection-then-Classification Model
(SCM) (Iida et al., 05)
federal judge
order
…
candidate
antecedents
tournament model
(Iida et al. 03)
USAirUSAir
GroupGroup
Inc Inc
USAir Group Inc
candidate
anaphor
suit
USAir
Federal judge
order … USAir Group Inc
candidate antecedents
suit
USAir
candidate
anaphor
10
Selection-then-Classification Model
(SCM) (Iida et al., 05)
federal judge
candidate
antecedents
order
tournament model
…
USAir Group Inc
suit
candidate
anaphor
USAir
USAir Group Inc
most likely
candidate
antecedent
11
Selection-then-Classification Model
(SCM) (Iida et al., 05)
federal judge
candidate
antecedents
order
tournament model
…
USAir Group Inc
suit
candidate
anaphor
USAir
USAir Group Inc
most likely
candidate
antecedent
score ≧ θ
ana
USAir is anaphoric and
USAir Group Inc is the antecedent of USAir
USAir Group Inc USAir
Anaphoricity
determination model
score  θ ana
USAir is non-anaphoric
12
Selection-then-Classification Model
(SCM) (Iida et al., 05)
federal judge
candidate
antecedents
order
tournament model
…
USAir Group Inc
suit
candidate
anaphor
USAir
USAir Group Inc
most likely
candidate
antecedent
score ≧ θ
ana
USAir is anaphoric and
USAir Group Inc is the antecedent of USAir
USAir Group Inc USAir
Anaphoricity
determination model
score  θ ana
USAir is non-anaphoric
13
Training the anaphoricity determination
model

Anaphoric
set of candidate
antecedents
NP1
NP2
NPi: candidate antecedent
NP3
Antecedent
anaphoric
noun phrase

Non-anaphoric
set of candidate
antecedents
NP4
NP5
Anaphoric
instances
ANP
NP4
NP1
NP2
tournament model
NP3
NP4
non-anaphoric
noun phrase
ANP
NP5
NANP
candidate
antecedent
NP3
Non-anaphoric
instances
NP3
NANP
14
Talk outline
1.
2.
3.
4.
5.
Zero-anaphora resolution: Background
Selection-then-classification model (Iida et al., 05)
Proposed model

Represents syntactic patterns based on dependency
trees

Uses a tree mining technique to seek useful sub-trees
to solve data sparseness problem

Incorporates syntactic pattern features in the
selection-then-classification model
Experiments on Japanese zero-anaphora
Conclusion and future work
15
New model
(TL)
LeftCand predicate
zeropronoun
(T )
candidate
anaphor
(TI)
predicate
R
RightCand
federal judge
candidate
antecedents
order
predicate
zeropronoun
LeftCand
RightCand
predicate
predicate
tournament model
(TL)
LeftCand predicate
…
USAir Group Inc
suit
USAir
USAir Group Inc
most likely
candidate
antecedent
score ≧ θ
ana
USAir is anaphoric and
USAir Group Inc is the antecedent of USAir
zeropredicate
pronoun
USAir Group Inc USAir
Anaphoricity
determination model
score  θ ana
USAir is non-anaphoric
16
Use of syntactic pattern features

Encoding parse tree features

Learning useful sub-trees
17
Encoding parse tree features
Mary-wa
Mary-TOP
Antecedent
John-ni
John-DAT
zero-pronoun
φ-ga
φ-NOM
tabako-o
smoking-OBJ
predicate
yameru-youni
quit-CONP
predicate
it-ta
say-PAST
18
Encoding parse tree features
Mary-wa
Mary-TOP
Antecedent
John-ni
John-DAT
zero-pronoun
φ-ga
φ-NOM
tabako-o
smoking-OBJ
predicate
yameru-youni
quit-CONP
predicate
it-ta
say-PAST
19
Encoding parse tree features
Antecedent
John-ni
John-DAT
Antecedent
zero-pronoun
φ-ga
φ-NOM
zero-pronoun
predicate
yameru-youni
quit-CONP
predicate
predicate
it-ta
say-PAST
predicate
20
Encoding parse tree features
Antecedent
John-ni
John-DAT
Antecedent
ni
DAT
zero-pronoun
φ-ga
φ-NOM
zero-pronoun
ga
CONJ
predicate
yameru-youni
quit-CONP
predicate
youni
CONJ
predicate
it-ta
say-PAST
predicate
ta
PAST
21
Encoding parse trees
LeftCand
Mary-wa
Mary-TOP
RightCand
John-ni
John-DAT
zero-pronoun
φ-ga
φ-NOM
tabako-o
smoking-OBJ
predicate
yameru-youni
quit-CONP
predicate
it-ta
say-PAST
(TL)
LeftCand
zeropronoun
(TR)
RightCand
zeropronoun
(TI)
predicate predicate
LeftCand
RightCand predicate
predicate predicate
22
Encoding parse trees

Antecedent identification
root
(TL)
LeftCand predicate
zeropronoun
(TR)
RightCand predicate
zeropronoun
(TI)
predicate
LeftCand
RightCand predicate
predicate
Three sub-trees
23
Encoding parse trees

Antecedent identification
root
…
(TL)
zeroLeftCand predicate
pronoun
(TR)
RightCand predicate
zeropronoun
(TI)
predicate
LeftCand
f1
f2
…
fn
RightCand predicate
predicate
Three sub-trees
Lexical, Grammatical,
Semantic, Positional and
Heuristic binary features
24
Encoding parse trees

label
Left or right
Antecedent identification
root
…
(TL)
zeroLeftCand predicate
pronoun
(TR)
RightCand predicate
zeropronoun
(TI)
predicate
LeftCand
f1
f2
…
fn
RightCand predicate
predicate
Three sub-trees
Lexical, Grammatical,
Semantic, Positional and
Heuristic binary features
25
Learning useful sub-trees

Kernel methods:




Tree kernel (Collins and Duffy, 01)
Hierarchical DAG kernel (Suzuki et al., 03)
Convolution tree kernel (Moschitti, 04)
Boosting-based algorithm:

BACT (Kudo and Matsumoto, 04) system learns a list of
weighted decision stumps with the Boosting algorithm
26
Learning useful sub-trees

Boosting-based algorithm: BACT


Learns a list of weighted decision stumps with Boosting
Classifies a given input tree by weighted voting
learn
Training instances
Labels
decision stumps
weight
0.4
positive
positive
positive
sub-tree
Label
positive
….
apply
Score: +0.34
 positive
27
Overall process
Input (a zero-pronoun φ in the sentence S)
syntactic
patterns
scoreintra≧θintra
Output the most-likely
Intra-sentential model
candidate antecedent
appearing in S
scoreintra<θintra
scoreinter≧θinter
Inter-sentential model
Output the most-likely
candidate appearing
scoreinter<θinter
outside of S
Return ‘‘non-anaphoric’’
28
Table of contents
Zero-anaphora resolution
Selection-then-classification model (Iida et al., 05)
Proposed model
1.
2.
3.


4.
5.
Parse encoding
Tree mining
Experiments
Conclusion and future work
29
Experiments

Japanese newspaper article corpus comprising zeroanaphoric relations: 197 texts (1,803 sentences)





995 intra-sentential anaphoric zero-pronouns
754 inter-sentential anaphoric zero-pronouns
603 non-anaphoric zero-pronouns
Recall =
# of correctly resolved zero-anaphoric relations
# of anaphoric zero-pronouns
Precision =
# of correctly resolved zero-anaphoric relations
# of anaphoric zero-pronouns the model detected
30
Experimental settings


Conducting five-fold cross validation
Comparison among four models




BM: Ng and Cardie (02)’s model:
 Identify an antecedent with candidate-wise classification
 Determine the anaphoricity of a given anaphor as a byproduct of the search for its antecedent
BM_STR: BM +syntactic pattern features
SCM: Selection-then-classification model (Iida et al., 05)
SCM_STR: SCM + syntactic pattern features
31
Results of intra-sentential ZAR

Antecedent identification (accuracy)
BM (Ng02)
BM_STR
SCM (Iida05)
SCM_STR
48.0%
(478/995)
63.5%
(632/995)
65.1%
(648/995)
70.5%
(701/995)
 The performance of antecedent identification improved by
using syntactic pattern features
32
Results of intra-sentential ZAR

antecedent identification + anaphoricity determination
33
Impact on overall ZAR

Evaluate the overall performance for both intrasentential and inter-sentential ZAR

Baseline model: SCM

resolves intra-sentential and inter-sentential zero-anaphora
simultaneously with no syntactic pattern features.
34
Results of overall ZAR
35
AUC curve

AUC (Area Under the recall-precision Curve) plotted by
altering θintra

Not peaky  optimizing parameter θintra is not difficult
36
Conclusion

We have addressed the issue of how to use syntactic patterns
for zero-anaphora resolution.



How to encode syntactic pattern features
How to seek useful sub-trees
Incorporating syntactic pattern features into our selection-thenclassification model improves the accuracy for intra-sentential
zero-anaphora, which consequently improves the overall
performance of zero-anaphora resolution
37
Future work

How to find zero-pronouns?


Designing a broader framework to interact with
analysis of predicate argument structure
How to find a globally optimal solution to the set
of zero-anaphora resolution problems in a given
discourse?

Exploring methods as discussed by McCallum and
Wellner (03)
38