構造的 言語処理を指向する

Kyoto-U: Syntactical EBMT
System for NTCIR-7 Patent
Translation Task
Kyoto University
Toshiaki Nakazawa Sadao Kurohashi
Overview of Kyoto-U System
Translation Examples
J: 図書館で新聞を読む
E: I read a newspaper in the library
J: 政治の本が売れ残っている
E: A book in politics was left on the shelf
・・・・・
Overview of Kyoto-U System
Translation Examples
図書館 で
library
in
新聞 を
newspaper ACC
読む
read
I
read
a newspaper
in the library
政治 の
politics in
本 が
book NOM
売れ残って いる
left unsold
・・・・・
a book
in politics
was left
on the shelf
・・・・・
Overview of Kyoto-U System
Input:
図書館で政治の
本を読む。
Translation Examples
図書館 で
新聞 を
図書館 で
in
library
政治 の
politics in
本 を
book ACC
読む
read
読む
I
read
a newspaper
in the library
I
read
a book
政治 の
本 が
売れ残って いる
a book
in politics
in the library
was left
on the shelf
・・・・・
in politics
・・・・・
Output:
I read a book
in politics
in the library
Alignment
Alignment
J: 交差点で、突然あの車が
飛び出して来たのです。
E:The car came at me from
the side at the intersection.
Alignment
交差
点で、
突然
あの
車が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
J: JUMAN/KNP
E: Charniak’s nlparser → Dependency tree
Alignment
交差
点で、
突然
あの
車が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences
Finding Correspondences
• Bilingual dictionaries (500K entries)
• Substring co-occurrence (Cromieres 2006)
count( j , e)

count( j )  count(e)
• Numeral normalization
二百十六万 → 2,160,000 ← 2.16 million
• Transliteration (Katakana words, NEs)
ローズワイン → rosuwain ⇔ rose wine (similarity:0.78)
新宿 → shinjuku ⇔ shinjuku (similarity:1.0)
Alignment
交差
点で、
突然
あの
車が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences
3. Disambiguation of correspondences
Alignment
交差
点で、
突然
あの
車が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences
3. Disambiguation of correspondences
4. Handling of remaining phrases
Extension to leaf-nodes
Alignment
交差
点で、
突然
あの
車が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences
3. Disambiguation of correspondences
4. Handling of remaining phrases
5. Registration to translation example database
Alignment Ambiguities
日本 で
you
[in Japan]
保険
will have to file
[insurance]
会社 に 対して
insurance
[to the company]
保険
an claim
[insurance]
請求 の
insurance
[of claim]
申し立て が
with the office
[file]
可能です よ
[be able to]
in Japan
Alignment: Consistency
Near
Far


arg max
alignment
n
n
i 1
j i 1
csd J ( ai , a j ), d E ( ai , a j ) 
n( n  1) / 2
• For each pair of candidates ai and aj
calculate the J-side distance dJ and
the E-side distance dE
• Give a consistency score to the pair based
on dJ and dE
• Calculate consistency scores for all the pairs
in a possible set of alignment candidates
Baseline
Distance of Each Branch: 1
1
1
Consistency Score: csd J , d E  

dJ dE
…
…
1/1+1/2=1.5
…
Consistency Score
• The frequency of distance pair in gold-standard
alignment data (Mainichi newspaper 40K
sentence pairs) [Uchimoto04]
Frequency
(log)
Dist of J-Side
Dist of E-Side
Distance based on Dependency Type
3
デ格
3
NP
日本 で
you
[in Japan]
1
文節内
3
連用
保険
will have to file
[insurance]
会社 に 対して
[to the company]
1
文節内
2
ノ格
3
ガ格
1
NN
保険
[insurance]
申し立て が
可能です よ
[be able to]
an claim
1
NN
請求 の
[of claim]
[file]
3
NP
insurance
insurance
3
PP
with the office
3
PP
in Japan
Distance based on Dependency Type
3
デ格
3
NP
日本 で
you
[in Japan]
1
文節内
3
連用
保険
will have to file
[insurance]
会社 に 対して
[to the company]
1
文節内
2
ノ格
3
ガ格
1
NN
保険
[insurance]
申し立て が
可能です よ
[be able to]
an claim
1
NN
請求 の
[of claim]
[file]
3
NP
insurance
insurance
3
PP
with the office
3
PP
in Japan
Distance based on Dependency Type
3
デ格
3
NP
日本 で
you
[in Japan]
1
文節内
3
連用
保険
will have to file
[insurance]
会社 に 対して
[to the company]
1
文節内
2
ノ格
3
ガ格
1
NN
保険
[insurance]
申し立て が
可能です よ
[be able to]
an claim
1
NN
請求 の
[of claim]
[file]
3
NP
insurance
insurance
3
PP
with the office
3
PP
in Japan
Example of Alignment
Improvement
Proposed model
Word-base alignment
Translation
Translation
Input:
図書館で政治の
本を読む。
Translation Examples
図書館 で
新聞 を
図書館 で
in
library
政治 の
politics in
本 を
book ACC
読む
read
読む
I
read
a newspaper
in the library
I
read
a book
政治 の
本 が
売れ残って いる
a book
in politics
in the library
was left
on the shelf
・・・・・
in politics
・・・・・
Output:
I read a book
in politics
in the library
Selection of Translation Examples
• Score for an example
1. Size of an example
[Sato 91]
2. Similarity of neighboring nodes
3. Translation probability
• Beam search from the root of the input
I
read
a newspaper
Input:
Translation
example:
in the library
I
図書館 で
図書館 で
in
library
政治 の
politics in
本 を
read
新聞 を
a newspaper
読む
book ACC
読む
in the library
read
0.7
wsize  2  wsim  0.7  wtrans
2

3
I
study
a newspaper
in the library
Combination of TMs
Input:
図書館で政治の
本を読む。
Translation Examples
図書館 で
新聞 を
図書館 で
in
library
政治 の
politics in
本 を
book ACC
読む
read
読む
I
read
a newspaper
in the library
I
read
a book
政治 の
本 が
売れ残って いる
a book
in politics
was left
on the shelf
・・・・・
・・・・・
in politics
in the library
Input:記録領域での変形形状と,記録特性の関係を調べた。
Translation Examples
Input
Dependency Tree
┌ 記録
┌ 領域 で の
├ 変形
┌ 形状 と ,
│ ┌ 記録
├ 特性 の
┌ 関係 を
調べた 。
┌ 状況 を
調べた 。
┌ 相互
┌ 作用 と
│┌ 記録
├ 特性 の
┌ 関係 を
調べた 。
┌ 大変
┌形
┌ 領域 で の
├ 断面
┌ 形状 を
模擬 した
┌ 記録
領域 の
┌ 変形
パターン を
┌ the situation
was examined
┌ the relationship
││┌ interaction and
││├ recording
│└ between characteristics
was investigated
┌ cross-sectional
┌ shape
││ ┌ large
││┌ deformation
│└ in the region
was
└ simulated
┌ recording
of the areas
┌ deformation
the pattern
Output
Dependency Tree
┌ the relationship
││ ┌ deformation
││┌ shape and
│││ │ ┌ recording
│││ └ in the region
││├ recording
│└ between characteristics
was examined
Output:
The relationship
between deformation
shape in the recording
region and recording
characteristics was
examined .
Evaluation Results
and
Discussion
Intrinsic J-E Evaluation Result
BLEU
Adequacy
Fluency
Average
27.20
NTT
3.81
tsbmt
4.02
Japio
3.88
tsbmt
27.14
moses
3.71
Japio
3.94
tsbmt
3.86
Japio
27.14
MIT
3.15
MIT
3.66
MIT
3.40
MIT
25.48
NAIST-NTT
2.96
NTT
3.65
NTT
3.30
NTT
24.79
NICT-ATR
2.85
Kyoto-U
3.55
moses
3.18
moses
24.49
KLE
2.81
moses
3.44
tori
3.10
Kyoto-U
23.10
tsbmt
2.66
NAIST-NTT
3.43
NAIST-NTT
3.04
NAIST-NTT
22.29
tori
2.59
KLE
3.35
Kyoto-U
3.01
tori
21.57
Kyoto-U
2.58
tori
3.28
HIT2
2.94
KLE
19.93
mibel
2.47
NICT-ATR
3.28
KLE
2.86
HIT2
19.48
HIT2
2.44
HIT2
3.09
mibel
2.78
NICT-ATR
19.46
Japio
2.38
mibel
3.08
NICT-ATR
2.74
mibel
15.90
TH
1.87
TH
2.42
FDU-MCandWI
2.13
TH
9.55
FDU-MCandWI
1.75
FDU-MCandWI
2.39
TH
2.08
FDU-MCandWI
1.41
NTNU
1.08
NTNU
1.04
NTNU
1.06
NTNU
Intrinsic E-J
Evaluation Result
BLEU
Adequacy
Fluency
Average
30.58
moses
3.53
tsbmt
3.69
moses
3.60
tsbmt
29.15
NICT-ATR
2.90
moses
3.67
tsbmt
3.30
moses
28.07
NTT
2.74
NTT
3.54
NTT
3.14
NTT
22.65
Kyoto-U
2.59
NICT-ATR
3.20
NICT-ATR
2.89
NICT-ATR
17.46
tsbmt
2.42
Kyoto-U
2.54
Kyoto-U
2.48
Kyoto-U
Critical Defect in EJ Translation
• Not caring whether a child node is a prechild or post-child
– Resulting target structure goes wrong
• After resolving this defect, BLEU score in
EJ translation rose to 24.02 from 22.65
BLEU
Adequacy
Fluency
Average
30.58
moses
3.53
tsbmt
3.69
moses
3.60
tsbmt
29.15
NICT-ATR
2.90
moses
3.67
tsbmt
3.30
moses
28.07
NTT
2.74
NTT
3.54
NTT
3.14
NTT
22.65
24.02
Kyoto-U
17.46
tsbmt
? 2.59
2.42
NICT-ATR
Kyoto-U
? 3.20
2.54
NICT-ATR
Kyoto-U
? 2.89
2.48
NICT-ATR
Kyoto-U
Conclusion
• Kyoto-U Fully Syntactic EBMT system:
1.
2.
3.
4.
Alignment: Consistency
Alignment: Extension
Translation: Discontinuous example
Translation: Easy combination
• By using syntactic information, we could
achieve reasonably high quality translation
• For patent translation, we may need some
pre-processings to handle special
expressions which cause parsing errors