Structural Phrase Alignment Based on Consistency Criteria Core Steps of Alignment Flow of Our EBMT System • Searching Correspondence Candidates Translation Examples Input 交差 (cross) 交差点に入る時 私の信号は青でした。 交差 (cross) 点に (point) at me my 突然 (suddenly) traffic at the intersection The light 家に (house) to remove 入る (enter) 時 (when) 二百十六万 → 2,160,000 ← 2.16 million entering a house 私 の (my) • Numeral normalization when entering 私 の (my) ローズワイン → rosuwain ⇔ rose wine (similarity:0.78) 新宿 → shinjuku ⇔ shinjuku (similarity:1.0) was green when 脱ぐ (put off) 信号 は (signal) 青 (blue) でした 。 (was) • Bilingual dictionaries • Transliteration (Katakana words, NEs) from the side 飛び出して 来た のです 。 (rush out) 入る (enter) 時 (when) – Fine alignment is efficient in translation – Search candidates as much as possible using variety of linguistic information came 点 で 、(point) Toshiaki Nakazawa, Kun Yu, Sadao Kurohashi (Graduate School of Informatics, Kyoto University) {nakazawa, kunyu}@nlp.kuee.kyoto-u.ac.jp [email protected] • Japanese flexible matching (Odani et. al. 2007) • Substring co-occurrence measure (Cromieres 2006) the intersection my サイン (signal) signature 信号 は (signal) 青 (blue) でした 。 (was) Language Models traffic Output My traffic light was green when entering the intersection. The light was green • Selecting Correspondence Candidates – More candidates derive more ambiguities and improper alignments – Necessity of robust alignment method which can align parallel sentences consistently by selecting the adequate candidates set Selecting Correspondence Candidates Using Consistency Score and Dependency Type Ambiguities! 日本 で 1 1 csd J , d E dJ dE you (in Japan) 保険 Near! will have to file (insurance) 会社 に 対して insurance Far! (to company) 保険 Far! an claim (insurance) 請求 の 1/1+1/2=1.5 insurance (claim) 申し立て が baseline with the office Near! (instance) 可能ですよ (you can) Improper alignments! in Japan How to reflect the inconsistency? Japanese arg max csd J (ai , a j ), d E (ai , a j ) alignment i j J-Side Distance E-Side Distance Consistency Score predicate: level C 6 S / SBAR / SQ … 5 predicate: level B+/B 5 VP / WHADVP 4 predicate: level B-/A 4 WHADJP case no / rentai 2 Inside clause 1 ADVP / ADJP NP / PP / INTJ Others 3 QP / PRT / PRN predicate: level A- Frequency (log) 3 Others 1 Dependency Type Distance 3 NP you 3 デ格 日本 で 1 文節内 Dist of J-Side Distribution of the distance of alignment pairs in hand-annotated data (Mainichi newspaper 40K sentence pairs) [Uchimoto04] 保険 Score [renyou] 1 [inside clause] 文節内 2 ノ格 will have to file 1 NN E-Side Distance [case “ga”] (instance) 可能です よ J-Side Distance Experimental Result 1 NN (claim) Pair 2: (Ds, Dt) = (1, 7) Negative Score (you can) insurance 3 PP with the office 3 PP in Japan Quality of Other Language Pairs 500 test sentences from Mainichi newspaper parallel corpus Bilingual dictionary: KENKYUSYA J-E/J-E 500K entries Evaluation criteria: Precision / Recall / F-measure Character-base for Japanese, word-base for English Rec 64.32 66.90 69.14 71.31 33.15 89.80 保険 請求 の 3 ガ格 申し立て が insurance 3 NP an claim (insurance) [case “no”] Pre 77.47 80.30 80.77 82.48 60.19 95.58 Pair 1: (Ds, Dt) = (1, 1) Positive Score (to company) Consistency Score Function * Using 300K newspaper domain bi-sentences for training (insurance) [inside clause] 3 連用 会社 に 対して “Near-Near” pair → Positive Score “Far-Far” pair → 0 “Near-Far” pair → Negative Score Baseline +Consistency Score Proposed(+CS,+DpndType) Filtering (80%) Moses (SMT Toolkit)* Manual (upper bound) (in Japan) [case “de”] Dist of E-Side • • • • English F 70.29 72.99 74.51 76.49 42.75 92.60 HLT-NAACL 2003 ACL 2005 (Gildea, 2003) GIZA++ EnglishFrench 5.71 15.89 EnglishRomanian 28.86 26.55 27.19 EnglishKorean 32 35 (AER) Conclusion • • • • Proposed a new phrase alignment method using consistency criteria. Enough alignment accuracy compared to other language pairs. We need to acquire the parameters automatically by machine learning. We are planning to evolve the framework which revises the parse result. (There is a translation demos in exhibition corner by NICT which is using our system!)
© Copyright 2024 ExpyDoc