PowerPoint プレゼンテーション

Incorporating Contextual Cues in
Trainable Models for Coreference Resolution
14 April 2003
Ryu Iida
Computational Linguistic Laboratory
Graduate School of Information Science
Nara Institute of Science and Technology
Background
Two approaches to coreference resolution
Rule-based approach
[Mitkov 97, Baldwin 95, Nakaiwa 96, Okumura 95, Murata 97]
Many attempted to encode linguistic cues into rules
Problem:
Further
manualinfluenced
refinementbyisCentering
needed inTheory
this study
This was
significantly
butWalker
it will et
beal.prohibitively
costly
[Grosz 95,
94, Kameyama,
86]
Best-achieved performance in MUC:
(Message Understanding Conference)
Precision roughly 70%
Recall
roughly 60%
Corpus-based machine learning approach
[Aone
and These
Bennettprevious
95, Soonwork
et al. tend
01, Ng
Problem:
toand
lackCardie
an 02, Seki 02]
Cost effective
appropriate reference to the theoretical
They have
achieved
a performance
comparable
to best performing
linguistic
work
on coherence
and coreference
rule-based systems
2
Background
Challenging issue
Achieving a good union between theoretical linguistic
findings and corpus-based empirical methods
3
Outline of this Talk
Background
Problems with previous statistical approaches
Two methods
Centering features
Tournament-based search model
Experiments
Conclusions
4
Statistical approaches [Soon et al. ‘01, Ng and Cardie ‘02]
Reach a level of performance comparable to state-of-theart rule-based systems
Recast the task of anaphora resolution as a sequence of
classification problems
5
Statistical approaches [Soon et al. ‘01, Ng and Cardie ‘02]
antecedent
[MUC-6]
A federal judge in Pittsburgh issued a temporary restraining
order preventing Trans World Airlines from buying additional
shares of USAir Group Inc.
〇
The order, requested in a suit filed by USAir, dealt another
×
blow to TWA's bid to buy the company
for $52 a share.
×
anaphor
the task is to classify these pairs of noun phrases into
positive or negative
positive instance: Pair of an anaphor and the antecedent
negative instance: Pairs of an anaphor and the NPs located
between the anaphor and the antecedent
USAir Group Inc
USAir
positive
order
USAir
negative
suit
USAir
negative
output
class
6
Statistical approaches [Soon et al. ‘01, Ng and Cardie ‘02]
Feature set [Ng and Cardie 02]
USAir Group Inc
USAir
order
USAir
suit
USAir
features
positive POS
DEMONSTRATIVE
negative STRING_MATCH
negative NUMBER
GENDER
SEM_CLASS
DISTANCE
SYNTACTIC ROLE
candidate
anaphor
先行詞候補
照応詞
先行詞候補 Prp_noun:1 照応詞
Organization:1
Organization:1
SENT_DIST:0
Person:1
Person:1
SENT_DIST:0
ハ:1
ハ:1
Person:1
Person:1
SENT_DIST:0
ハ:1
ハ:1
STR_MATCH:0
Pronoun:0
Pronoun:0
Pronoun:0
Pronoun:1
Pronoun:0
Pronoun:1
positive
negative
negative
training (C4.5)
Model (decision tree)
7
Statistical approaches [Soon et al. ‘01, Ng and Cardie ‘02]
Test Phase [Ng and Cardie, 02]
candidates
extract NPs
-2.0 Input each pair of given anaphor
NP1
NP2
NP3
Select the best-scored candidate
and
one of these candidates to
-1.1
as the output
the decision tree
-0.4
We
refer to-1.0
Ng and Cardie’s model as
NP4
the baseline-3.5
of our empirical evaluation
NP5
NP6
NP7
NP8
1.5
-0.3
antecedent
-2.5
NP6
anaphor
Precision 78.0%, Recall 64.2%
Slightly better than best-performing rule-based model at MUC-7
8
A drawback of the previous statistical models
Sarah went downstairs and received another curious shock, for
when Glendora flapped into the dining room in her home made
The previous models do not capture
moccasins, Sarah asked her when she had brought coffee to her
localsaid
context
appropriately
room, and Glendora
she hadn't.
antecedent
anaphor
[Kameyama 98]
Sarah
she
negative
Glendora
she
positive
Positive and negative
instances may have the
identical feature vector
POS
Prop_Noun
Pronoun
NE
SEM_CLASS
SENT_DIST
POS
Prop_Noun
Pronoun
NE
SEM_CLASS
SENT_DIST
features
: Noun
: Yes
: No
: PERSON
: Person
:0
:
:
:
:
:
:
Noun
Yes
No
PERSON
Person
0
9
Two methods
Two methods
Use more sophisticated linguistic cues: centering features
Augmentation of a set of new features inspired by Centering
Theory that implement local contextual factors
Improve the search algorithm: tournament model
A new model which makes pair-wise comparisons between
candidates
11
Centering Features
Sarah
she
negative
Glendora
she
positive
the problem is that the current feature
set does not tell the difference
between these two candidates
POS
Prop_Noun
Pronoun
NE
SEM_CLASS
SENT_DIST
POS
Prop_Noun
Pronoun
NE
SEM_CLASS
SENT_DIST
:
:
:
:
:
:
:
:
:
:
:
:
features
Noun
Yes
No
PERSON
Person
0
Noun
Yes
No
PERSON
Person
0
Sarah went downstairs and received another curious shock,
CHAIN(Cb = Cp = Sarah)
……
transition
she hadn't.
CHAIN(Cb = Cp = Glendora)
antecedent
Glendora
Introduce extra devices such as the forward-looking center list
Encode state transitions on them into a set of additional features
12
Two methods
Use more sophisticated linguistic cues: centering features
We augment the feature set with a set of new features
inspired by Centering theory that implement local contextual
factors
Improve the search algorithm: tournament model
We propose a new model which makes pair-wise
comparisons between antecedent candidates
13
Tournament model
What we want to do is to answer a question which is more
likely to be coreferent, Sarah or Glendora
Sarah went downstairs and received another curious shock, for
× dining room in her home made
when Glendora ×
flapped into the
moccasins,
× Sarah asked
〇 her when she had brought coffee to her
room, and
〇 Glendora said
〇 she hadn't.
Conduct a tournament consisting of a series of matches in
which candidates compete with each other
Match victory is determined by a pairwise comparison
between candidates as a binary classification problem
Most likely candidate is selected through a single-elimination
tournament of matches
14
Tournament model
Training instances
features
Training Phase
class
In the tournament, the correct
antecedent NP5 must prevail over any of
the other four candidates
NP1
NP5
ANP
right
NP4
NP5
ANP
right
Extract four training instances
NP5
NP7
ANP
left
Induce a pairwise classifier from a set of
extracted training instances
NP5
NP8
ANP
left
The classifier classifies a given pair of
candidates into left or right
the right hand side of a given
pair wins (is more likely to be
the antecedent)
antecedent
NP1
NP2
NP3
NP4
NP5
coreferent
beginning of
document
NP6
NP7
NP8
ANP
anaphor
15
Tournament model
1. the first match is arranged
between the nearest candidates
(NP7 and NP8)
2. each of the following matches
arranged in turn between the
winner (NP8) of the previous
match and a new challenger
(NP5)
Test Phase
NP1
NP2
NP3
NP4
NP5
coreferent
beginning of
document
NP6
NP7
NP8
ANP
anaphor
16
Tournament model
3. the winner is next matched against
the next challenger (NP4)
4. this process is repeated until the
last one participate
5. the model selects the candidate
that prevails through the final round
as the answer
Test Phase
antecedent
NP5
NP1
NP2
NP3
NP4
NP5
coreferent
beginning of
document
NP6
NP7
NP8
ANP
anaphor
17
Experiments
Experiments
Empirical evaluation on Japanese zero-anaphora
resolution
Japanese does not normally use personal pronoun as
anaphor
Instead, Japanese uses zero-pronouns
Comparison among four models
1.
2.
3.
4.
Baseline model
Baseline model with Centering Features
Tournament model
Tournament model with Centering Features
19
Centering Features in Japanese
Japanese anaphora resolution model [Nariyama 02]
Expansion of Kameyama’s work on the application of
Centering Theory to Japanese zero-anaphora resolution
Expanding the original forward-looking center list
into Salience Reference List (SRL) to take into account
broader contextual information
More use of linguistic information
In the experiments, we introduced two features to reflect
the SRL-related contextual factors
20
Method
Data
GDA-tagged Japanese
newspaper article corpus
Texts
Sentences
Tags of anaphoric relation
Tags of ellipsis (Zero-anaphor)
GDA
:
:
:
:
2,176
24,475
14,743
5,966
MUC-6
60
8,946
0
As a preliminarily test, only resolving subject zeroanaphors, 2,155 instances in total
Conduct five fold cross-validation on that data set with
support vector machines
21
Feature set (see our paper for details)
1. Features for simulating Ng and Cardie’s feature set
POS
Pronoun
Particle
Named-Entity
Semantic class
Animacy
Selectional Restrictions
Distance between an
anaphor and the candidate
Number of
anaphoric relations
2. Centering Features
Order in SRL
Heuristic rule of preference
3. Features for capturing the relations between two
candidates
introduce only in
Preference of SRL in two candidates
Preference of Animacy in two candidates
Distance between two candidates
tournament model
but not in the baseline
model
22
Results
Tournament
model
Baseline model +
Centering Features
Baseline model
Tournament model +
Centering Features
23
Results (1/3) the effect of incorporating centering features
Baseline model +
Centering Features
67.0%
64.0%
Baseline model
centering features were reasonably effective
24
Results (2/3)
Tournament
model
Baseline model +
Centering Features
70.8%
67.0%
64.0%
Baseline model
Introducing the tournament model significantly improved
the performance regardless the size of training data
25
Results (3/3)
Tournament
model
Baseline model +
Centering Features
70.8%
69.7%
67.0%
64.0%
Baseline model
Tournament model +
Centering Features
most complex model did not outperform the tournament model without
The improvement ratio of this model against the data size is the best of all
centering features
26
Results after cleaning data
(March ‘03)
Tournament model +
Centering Features
74.3%
72.5%
Tournament
model
the tournament model with centering features
is more effective than the one without centering features
27
Conclusions
Our concern is achieving a good union between theoretical
linguistic findings and corpus-based empirical methods
We presented a trainable coreference resolution model
that is designed to incorporate contextual cues
by means of centering features and a tournament-based
search algorithm.
These two improvements worked effectively in our
experiments on Japanese zero-anaphora resolution.
28
Future Work
In Japanese zero-anaphora resolution,
1. Identification of relations between the topic and subtopics
2. Analysis of complex and quoted sentences
3. Refinement of the treatment of selectional restrictions
29
30
Tournament model
Training instances
features
Training Phase
beginning of document
coreferent
antecedent
class
NP1
NP5
ANP
right
NP1
NP4
NP5
ANP
right
NP2
NP5
NP7
ANP
left
NP5
NP8
ANP
left
NP3
coreferent
NP4
NP5
coreferent
anaphor
NP6
In the tournament, the correct
antecedent NP5 must prevail over any of
the other four candidates
NP7
 extract four training instances
NP8
Induce from a set of extracted training
instances a pairwise classifier
ANP
31
Tournament model
Test Phase
beginning of document
NP1
<
coreferent
NP2
coreferent
NP3
<
NP4
>
NP5
NP6
NP7
NP8
anaphor
ANP
<
coreferent
A tournament consists of a
series of matches
in which candidates compete
with each other
antecedent
NP5
32
Tournament model
What we want to do is to answer a question which is more
likely to be coreferent, Sarah or Glendora
<
<
Sarah went downstairs and received another curious shock, for
when Glendora flapped into the dining room in her home made
moccasins, Sarah asked her when she had brought coffee to her
room, and Glendora said she hadn't.
<
CHAIN(Cb = Cp = Sarah):
transition
CHAIN(Cb = Cp = Glendora):
Implement a pairwise comparison between candidates as
a binary classification problem
Sarah
<
Glendora
she
33
Tournament model
Sarah went downstairs and received another
curious shock, for when Glendora flapped into
the dining room in her home made moccasins,
Sarah asked her when she had brought coffee to
her room, and Glendora said she hadn't.
Training Phase
She
downstairs
extract NPs
Glendora
Training instances
moccasins
downstairs
<
Glendora
she
moccasins
<
Glendora
she
coffee
<
Glendora
she
Sarah
<
Glendora
she
room
<
Glendora
she
Sarah
her
she
coffee
her
room
Glendora
she
coreferred
coreferent
output class
34
Conclusions
To incorporate linguistic cues into trainable approaches:
Add features which takes into consideration linguistic cues
such as Centering Theory: Centering Features
Propose the novel search model which the candidates are
compared in terms of the likelihood of antecedents:
Tournament model
In Japanese zero-anaphora resolution task,
Tournament model significantly outperforms earlier
machine learning approaches [Ng and Cardie 02]
Incorporating linguistic cues
in machine learning models is effective
35
Data
GDA-tagged Japanese
newspaper article corpus
Texts
Sentences
Tags of anaphoric relation
Tags of ellipsis (Zero-anaphor)
:
:
:
:
GDA
MUC-6
2,176
24,475
14,743
5,966
60
8,946
0
coreferent
<n id=“tagid1”>クリントン米大統領</n>の内政の最大課題のひとつである<n
id=“tagid2”>包括犯罪対策法案</n>が十一日の下院本会議で、審議・表決に
移ることを承認する動議が、反対二二五対賛成二一〇で否決された。
これで
coreferent
<n eq=“tagid2”>同法案</n>は事実上、大幅修正または廃案に追い込まれた
。 <n eq=“tagid1”>同大統領</n>は緊急会見で怒りをあらわにして、法案の復
活を要求。 <n eq=“tagid1”>同大統領</n>は中間選挙を前に得点を
<v agt=“tagid1”>あげる<v>ことを目指したが、逆に大きな痛手を受けた。
Ellipsis (AGENT)
Extract 2,155 example
36
Statistical approaches [Soon et al. 01, Ng and Cardie 02]
Reach a level of performance comparable to state-of-theart rule-based systems
pairs of noun phrases into positive or negative.
Recast the task of anaphora resolution as a sequence of
classification problems
[MUC-6]
A federal judge in Pittsburgh issued a temporary restraining
order preventing Trans World Airlines from buying additional
shares of USAir Group Inc.
The order, requested in a suit filed by USAir, dealt another
blow to TWA's bid to buy the company for $52 a share.
Pair of an anaphor and the antecedent:positive instance
Pairs of an anaphor and the NPs located between
the anaphor and the antecedent: negative instance
USAir Group Inc
USAir
positive
order
USAir
negative
suit
USAir
negative
output
class
37
*Centering Features
Centering Theory [Grosz 95, Walker et al. 94, Kameyama, 86]
Part of an overall theory of discourse structure and meaning
Two levels of discourse coherence: global and local
Centering models the local-level component of
attentional state
e.g. Intrasentential centering [Kameyama 97]
Sarah went downstairs and received another curious shock, for
when Glendora flapped into the dining room in her home made
moccasins, Sarah asked her when she had brought coffee to her
room, and Glendora said she hadn't.
38
*Centering Features in English [Kameyama 97]
CHAIN(Cb = Cp = Sarah):
Sarah went downstairs and received another curious shock,
ESTABLISH(Cb = Cp = Glendora):
for when Glendora flapped into the dining room in her home made moccasins,
CHAIN(Cb = Glendora, Cp = Sarah):
Sarah asked her
CHAIN(Cb = Cp = Glendora):
when she had brought coffee to her room,
CHAIN(Cb = NULL, Cp = Glendora):
and Glendora said
CHAIN(Cb = Cp = Glendora):
she hadn't.
[Kameyama 97]
39
*Centering Features in English [Kameyama 97]
The essence is that takes into account the preference
between candidates
Sarah went downstairs and received another curious shock,
CHAIN(Cb = Cp = Sarah)
……
transition
she hadn't.
CHAIN(Cb = Cp = Glendora)
Cb and Cp distinguish the two candidates
Implement local contextual factor: centering features
40
*Tournament model
Test Phase
She
< <
downstairs
shock
Glendora
room
A tournament consists of a
series of matches
in which candidates
compete with each other
<
her
moccasins
<
Sarah
her
she
antecedent
Glendora
<
coffee
<
her
<
room
Glendora
she
41
Rule-based Approaches
Encoding linguistic cues into rules manually
Thematic roles of the candidates
Order of the candidates
Semantic relation between anaphors and antecedents
etc..
Further manual refinement of rule-based models
This approaches
by Centering
will are
be influenced
prohibitively
costly Theory
[Grosz 95, Walker et al. 94, Kameyama, 86]
The Coreference Resolution Task of
Message Understanding Conference (MUC-6 / MUC-7)
Precision:
Recall:
roughly 70%
roughly 60%
42
Statistical Approaches with Tagged-Corpus
The statistical approaches have achieved a performance
comparable to the best-performing rule-based systems
Lack an appropriate reference to theoretical linguistic
work on coherence and coreference
Making a good marriage
between theoretical linguistic findings and
corpus-based empirical methods
43
*Test Phase [Soon et al. 01]
candidates
extracting NP
judge
Pittsburgh
order
A federal judge in Pittsburgh issued a
temporary restraining order preventing
Trans World Airlines from buying additional
shares of USAir Group Inc.
The order, requested in a suit filed by
USAir, dealt another blow to TWA's bid to
buy the company for $52 a share.
Trans World Airlines
antecedent
share
〇
USAir Group Inc
×
order
a suit
anaphor
USAir Group Inc
×
USAir
Precision 67.3%, Recall 58.6% on MUC data set
44
Improving Soon’s model
[Ng and Cardie 02]
Expanding the feature set
12 features
⇒ 53 features
POS
DEMONSTRATIVE
STRING_MATCH
NUMBER
GENDER
SEM_CLASS
DISTANCE
SYNTACTIC ROLE
Introducing a new search algorithm
45
Test Phase [Soon et al. 01]
candidates
extracting NP
judge
Pittsburgh
order
A federal judge in Pittsburgh issued a
temporary restraining order preventing
Trans World Airlines from buying additional
shares of USAir Group Inc.
The order, requested in a suit filed by
USAir, dealt another blow to TWA's bid to
buy the company for $52 a share.
Trans World Airlines
antecedent
share
〇
USAir Group Inc
×
order
a suit
anaphor
USAir Group Inc
×
USAir
Precision 67.3%, Recall 58.6% on MUC data set
46
Task of Coreference Resolutions
Two process
Resolution of anaphors
Resolution of antecedents
applications
Machine Translation, IR, etc
antecedent
A federal judge in Pittsburgh issued a temporary restraining
order preventing Trans World Airlines from buying
additional shares of USAir Group Inc.
The order, requested in a suit filed by USAir, dealt another
blow to TWA's bid to buy the company for $52 a share.
(Same color NPs are coreferred)
anaphor
[MUC-6]
47
Future Work
Evaluate some examples
Tournament model doesn’t deal with Direct quote
……
獄に下るモハンメドは妻にこう言い残した。「おれが刑務所にいる間、外で働
いてはいけない」。貞節を守れ、という意味だ。さすがに刑務所で新しい子供
に恵まれる可能性はないと思ったのだろうか。
SRL
Proposed methods cannot deal with Topic
different discourse structures
モハンメド
Focus
おれ
I-Obj
刑務所
D-Obj
NULL
Others
外
48
Centering Features of Japanese
Adding the likelihood of antecedents into features
In Japanese, wa-marked NPs tend to be topics
Topics tend to be omitted
Salience Reference List (SRL) [Nariyama 02]
Store NPs in SRL from the beginning of text
Overwrite the old entity if new entity fills same point
Topic/φ (wa) > Focus (ga) >
I-Obj (ni)
> D-Obj (wo) > Others
…NP1-waNP2-wo…。
…NP3-ga…、NP4-ha…。
…NP5-ni……(φ-ga)V。
Topic
NULL
NP1
NP4
Focus
NULL
NP3
I-Obj
NULL
NP5
D-Obj
NULL
NP2
Others
NULL
preferred
49
Evaluation of models
Introduce a confidence measure
Confidence coefficient is the value when two candidates
are the nearest at the tournament
President A
0.9
armistice
<
2.4
corefered
<
President B
<
3.2
this
he
>
3.8
action
he
50
Evaluation of Tournament model
investigate the Tournament model (the best performance )
51
Centering Features
example
President
A proposed the armistice,
ドゥダエフ大統領は、正月休戦を提案したが、
but
President B ignored this.
エリツィン・ロシア大統領はこれを黙殺し、
And
he started action.
(φガ)行動を開始した。
SRL
ドゥダエフ大統領
NULL
> NULL >
> NULL > NULL
エリツィン・ロシア大統領
行動
> NULL >
> NULL > NULL
52
*Features (1/3) Ng’s model, Tournament model
Features are decided by one candidate
candidate1







candidate2
anaphor
POS
Pronoun
Particle
Named-Entity
the number of anaphoric relations
First NP in a sentence
Order of SRL
53
*Features (2/3) Ng’s model, Tournament model
Features are decided by a pair of an anaphor and the
candidate
candidate1
candidate2
anaphor
 Selectional restrictions
 the pair of candidate and anaphor satisfies
selectional restriction in Nihongo Goi Taikei
 log-likelihood ratio calculated from
cooccurrence data
 Distance in terms of sentence between
an anaphor and the candidate
54
*Features (3/3) only Tournament model
Features are decided by the relation between two
candidates
candidate1
candidate2
Distance in terms of sentence between two
candidates
Animacy
Whether or not one candidate belongs to
the class of PERSON or ORGANIZATION
Which candidate is preferred in SRL
anaphor
Topic
NULL
Focus
NP1
I-Obj
NULL
D-Obj
NP2
Others
NULL
55
Anaphoric relations
endophora
Antecedent exists in a context
anaphora
Antecedents precede anaphors
cataphora
Anaphors precede antecedents
exophora
Antecedent doesn’t exist in context
Variety of antecedents
Noun Phrase (NP), Sentence, text, etc
Many previous works deal with anaphora resolutions
The number of antecedent-anaphor examples is the most of all
56
Results (examples: 2155 ⇒ 2681)
Tournament model
The Model using Centering Features gets worse than
the model without Centering Features
Tournament
model
Ng’s model +
Centering Features
Ng’s model
Tournament model +
Centering Features
Modify some tagging errors by hand
examples: 2155 ⇒ 2681
57