Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi Overview of Kyoto-U System Translation Examples J: 図書館で新聞を読む E: I read a newspaper in the library J: 政治の本が売れ残っている E: A book in politics was left on the shelf ・・・・・ Overview of Kyoto-U System Translation Examples 図書館 で library in 新聞 を newspaper ACC 読む read I read a newspaper in the library 政治 の politics in 本 が book NOM 売れ残って いる left unsold ・・・・・ a book in politics was left on the shelf ・・・・・ Overview of Kyoto-U System Input: 図書館で政治の 本を読む。 Translation Examples 図書館 で 新聞 を 図書館 で in library 政治 の politics in 本 を book ACC 読む read 読む I read a newspaper in the library I read a book 政治 の 本 が 売れ残って いる a book in politics in the library was left on the shelf ・・・・・ in politics ・・・・・ Output: I read a book in politics in the library Alignment Alignment J: 交差点で、突然あの車が 飛び出して来たのです。 E:The car came at me from the side at the intersection. Alignment 交差 点で、 突然 あの 車が 飛び出して 来た のです the car came at me from the side at the intersection 1. Transformation into dependency structure J: JUMAN/KNP E: Charniak’s nlparser → Dependency tree Alignment 交差 点で、 突然 あの 車が 飛び出して 来た のです the car came at me from the side at the intersection 1. Transformation into dependency structure 2. Detection of word(s) correspondences Finding Correspondences • Bilingual dictionaries (500K entries) • Substring co-occurrence (Cromieres 2006) count( j , e) count( j ) count(e) • Numeral normalization 二百十六万 → 2,160,000 ← 2.16 million • Transliteration (Katakana words, NEs) ローズワイン → rosuwain ⇔ rose wine (similarity:0.78) 新宿 → shinjuku ⇔ shinjuku (similarity:1.0) Alignment 交差 点で、 突然 あの 車が 飛び出して 来た のです the car came at me from the side at the intersection 1. Transformation into dependency structure 2. Detection of word(s) correspondences 3. Disambiguation of correspondences Alignment 交差 点で、 突然 あの 車が 飛び出して 来た のです the car came at me from the side at the intersection 1. Transformation into dependency structure 2. Detection of word(s) correspondences 3. Disambiguation of correspondences 4. Handling of remaining phrases Extension to leaf-nodes Alignment 交差 点で、 突然 あの 車が 飛び出して 来た のです the car came at me from the side at the intersection 1. Transformation into dependency structure 2. Detection of word(s) correspondences 3. Disambiguation of correspondences 4. Handling of remaining phrases 5. Registration to translation example database Alignment Ambiguities 日本 で you [in Japan] 保険 will have to file [insurance] 会社 に 対して insurance [to the company] 保険 an claim [insurance] 請求 の insurance [of claim] 申し立て が with the office [file] 可能です よ [be able to] in Japan Alignment: Consistency Near Far arg max alignment n n i 1 j i 1 csd J ( ai , a j ), d E ( ai , a j ) n( n 1) / 2 • For each pair of candidates ai and aj calculate the J-side distance dJ and the E-side distance dE • Give a consistency score to the pair based on dJ and dE • Calculate consistency scores for all the pairs in a possible set of alignment candidates Baseline Distance of Each Branch: 1 1 1 Consistency Score: csd J , d E dJ dE … … 1/1+1/2=1.5 … Consistency Score • The frequency of distance pair in gold-standard alignment data (Mainichi newspaper 40K sentence pairs) [Uchimoto04] Frequency (log) Dist of J-Side Dist of E-Side Distance based on Dependency Type 3 デ格 3 NP 日本 で you [in Japan] 1 文節内 3 連用 保険 will have to file [insurance] 会社 に 対して [to the company] 1 文節内 2 ノ格 3 ガ格 1 NN 保険 [insurance] 申し立て が 可能です よ [be able to] an claim 1 NN 請求 の [of claim] [file] 3 NP insurance insurance 3 PP with the office 3 PP in Japan Distance based on Dependency Type 3 デ格 3 NP 日本 で you [in Japan] 1 文節内 3 連用 保険 will have to file [insurance] 会社 に 対して [to the company] 1 文節内 2 ノ格 3 ガ格 1 NN 保険 [insurance] 申し立て が 可能です よ [be able to] an claim 1 NN 請求 の [of claim] [file] 3 NP insurance insurance 3 PP with the office 3 PP in Japan Distance based on Dependency Type 3 デ格 3 NP 日本 で you [in Japan] 1 文節内 3 連用 保険 will have to file [insurance] 会社 に 対して [to the company] 1 文節内 2 ノ格 3 ガ格 1 NN 保険 [insurance] 申し立て が 可能です よ [be able to] an claim 1 NN 請求 の [of claim] [file] 3 NP insurance insurance 3 PP with the office 3 PP in Japan Example of Alignment Improvement Proposed model Word-base alignment Translation Translation Input: 図書館で政治の 本を読む。 Translation Examples 図書館 で 新聞 を 図書館 で in library 政治 の politics in 本 を book ACC 読む read 読む I read a newspaper in the library I read a book 政治 の 本 が 売れ残って いる a book in politics in the library was left on the shelf ・・・・・ in politics ・・・・・ Output: I read a book in politics in the library Selection of Translation Examples • Score for an example 1. Size of an example [Sato 91] 2. Similarity of neighboring nodes 3. Translation probability • Beam search from the root of the input I read a newspaper Input: Translation example: in the library I 図書館 で 図書館 で in library 政治 の politics in 本 を read 新聞 を a newspaper 読む book ACC 読む in the library read 0.7 wsize 2 wsim 0.7 wtrans 2 3 I study a newspaper in the library Combination of TMs Input: 図書館で政治の 本を読む。 Translation Examples 図書館 で 新聞 を 図書館 で in library 政治 の politics in 本 を book ACC 読む read 読む I read a newspaper in the library I read a book 政治 の 本 が 売れ残って いる a book in politics was left on the shelf ・・・・・ ・・・・・ in politics in the library Input:記録領域での変形形状と,記録特性の関係を調べた。 Translation Examples Input Dependency Tree ┌ 記録 ┌ 領域 で の ├ 変形 ┌ 形状 と , │ ┌ 記録 ├ 特性 の ┌ 関係 を 調べた 。 ┌ 状況 を 調べた 。 ┌ 相互 ┌ 作用 と │┌ 記録 ├ 特性 の ┌ 関係 を 調べた 。 ┌ 大変 ┌形 ┌ 領域 で の ├ 断面 ┌ 形状 を 模擬 した ┌ 記録 領域 の ┌ 変形 パターン を ┌ the situation was examined ┌ the relationship ││┌ interaction and ││├ recording │└ between characteristics was investigated ┌ cross-sectional ┌ shape ││ ┌ large ││┌ deformation │└ in the region was └ simulated ┌ recording of the areas ┌ deformation the pattern Output Dependency Tree ┌ the relationship ││ ┌ deformation ││┌ shape and │││ │ ┌ recording │││ └ in the region ││├ recording │└ between characteristics was examined Output: The relationship between deformation shape in the recording region and recording characteristics was examined . Evaluation Results and Discussion Intrinsic J-E Evaluation Result BLEU Adequacy Fluency Average 27.20 NTT 3.81 tsbmt 4.02 Japio 3.88 tsbmt 27.14 moses 3.71 Japio 3.94 tsbmt 3.86 Japio 27.14 MIT 3.15 MIT 3.66 MIT 3.40 MIT 25.48 NAIST-NTT 2.96 NTT 3.65 NTT 3.30 NTT 24.79 NICT-ATR 2.85 Kyoto-U 3.55 moses 3.18 moses 24.49 KLE 2.81 moses 3.44 tori 3.10 Kyoto-U 23.10 tsbmt 2.66 NAIST-NTT 3.43 NAIST-NTT 3.04 NAIST-NTT 22.29 tori 2.59 KLE 3.35 Kyoto-U 3.01 tori 21.57 Kyoto-U 2.58 tori 3.28 HIT2 2.94 KLE 19.93 mibel 2.47 NICT-ATR 3.28 KLE 2.86 HIT2 19.48 HIT2 2.44 HIT2 3.09 mibel 2.78 NICT-ATR 19.46 Japio 2.38 mibel 3.08 NICT-ATR 2.74 mibel 15.90 TH 1.87 TH 2.42 FDU-MCandWI 2.13 TH 9.55 FDU-MCandWI 1.75 FDU-MCandWI 2.39 TH 2.08 FDU-MCandWI 1.41 NTNU 1.08 NTNU 1.04 NTNU 1.06 NTNU Intrinsic E-J Evaluation Result BLEU Adequacy Fluency Average 30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt 29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses 28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT 22.65 Kyoto-U 2.59 NICT-ATR 3.20 NICT-ATR 2.89 NICT-ATR 17.46 tsbmt 2.42 Kyoto-U 2.54 Kyoto-U 2.48 Kyoto-U Critical Defect in EJ Translation • Not caring whether a child node is a prechild or post-child – Resulting target structure goes wrong • After resolving this defect, BLEU score in EJ translation rose to 24.02 from 22.65 BLEU Adequacy Fluency Average 30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt 29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses 28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT 22.65 24.02 Kyoto-U 17.46 tsbmt ? 2.59 2.42 NICT-ATR Kyoto-U ? 3.20 2.54 NICT-ATR Kyoto-U ? 2.89 2.48 NICT-ATR Kyoto-U Conclusion • Kyoto-U Fully Syntactic EBMT system: 1. 2. 3. 4. Alignment: Consistency Alignment: Extension Translation: Discontinuous example Translation: Easy combination • By using syntactic information, we could achieve reasonably high quality translation • For patent translation, we may need some pre-processings to handle special expressions which cause parsing errors
© Copyright 2024 ExpyDoc