Language & Knowledge Engineering Lab Example-based Machine Translation Pursuing Fully Structural NLP Kurohashi-lab M1 56430 Toshiaki Nakazawa Language & Knowledge Engineering Lab Outline I. II. History of Machine Translation Introduction of recent MT systems i. Statistic Machine Translation (SMT) ii. Example-based Machine Translation (EBMT) III. Related work for EBMT i. Logical Form ii. Efficient retrieval method IV. V. EBMT pursuing fully structural NLP Conclusion Language & Knowledge Engineering Lab Outline I. II. History of Machine Translation Introduction of recent MT systems i. Statistic Machine Translation (SMT) ii. Example-based Machine Translation (EBMT) III. Related work for EBMT i. Logical Form ii. Efficient retrieval method IV. V. EBMT pursuing fully structural NLP Conclusion Language & Knowledge Engineering Lab History of Machine Translation MT quality had been MT quality didn’t “Machine Translation improving because of improved despite based on analogy” When I much look at an article in Russian, the development of spending is proposed I say: "This is really written in Not enough NLP money English, in [Nagao, 1981]but is has been coded quality yet… some strange symbols. I will now Doldrums of Beginning“Mu of project” proceed to decode." MT Machine started SMT had been [Warren Weaver, 1947] Translation becoming active [Brown et al., 1993] Language & Knowledge Engineering Lab Outline I. II. History of Machine Translation Introduction of recent MT systems i. Statistic Machine Translation (SMT) ii. Example-based Machine Translation (EBMT) III. Related work for EBMT i. Logical Form ii. Efficient retrieval method IV. V. EBMT pursuing fully structural NLP Conclusion Language & Knowledge Engineering Lab Statistical Machine Translation (SMT) Parallel Corpus Learn models for translation from parallel corpus statistically 田植えフェスティバル石川県輪島市で外国の大使や一般の参加者など千 人あまりが急な斜面の棚田で田植えを体験する催しが行われました。 輪島市白米町には(しろよねまち)千枚田と呼ばれる(せんまいだ)大小二 千百枚の棚田が急な斜面から海に向かって拡がっています。 Ambassadors and diplomats from 37 countries took part in a rice planting festival on Sunday in small paddies on steep hillsides in Wajima, central Japan. About one-thousand people gathered at the hill, where some two-thousand 100 miniature paddies, called Senmaida, stretch toward the Sea of Japan. Not use any linguistic resources 田植え体験は農作業を通して米作りの意義などを考えていこうという地球 環境平和財団の呼び掛けで開かれたもので、海外三十四ヵ国の大使や書 記官、それに一般の参加者ら合わせておよそ千人が集まりました。 Small translation unit (= “word”) 田植えに使われた苗は去年の秋、天皇陛下が皇居で収穫された稲籾から 育てたものです。 参加者たちは裸足になって水田に足を踏み入れ地元に伝わる田植え歌に 合わせて慣れない手つきで苗を植えていました。 The event was organized by the private Foundation for Global Peace and Environment. The rice seedlings are grown from grain harvested by the Emperor at the Imperial Palace in Tokyo last autumn. Barefoot participants waded into the paddies to plant the seedlings by hand while singing a local folk song about the practice of rice planting. Require large parallel corpus for highlyaccurate translation きょうの輪島市は雲が広がったもののまずまずの天気となり、出席された 高円宮さまも海からの風に吹かれながら田植えに加わっていました。 地球環境平和財団では今年の夏休みに全国の子どもたちを対象に草刈り や生きものの観察会を開く他、秋には稲刈体験を行なう予定にしています。 Language & Knowledge Engineering Lab Basic Method for SMT Translate by maximizing the probability: E arg max P ( E | J ) E arg max P ( E ) P ( J | E ) E Language Model Translation Model Learn from a parallel corpus Language & Knowledge Engineering Lab Translation Model IBM Model 4 [Brown et al., 93] × × × = Translation Model Probability of translation from one E word to one J word Model for word order # of Japanese words which each English word Modelgenerates for generating NULL to justify the # of words Language & Knowledge Engineering Lab Overview of EBMT 交差 点で、 Parallel Corpus at the intersection Alignment TMDB Input Translation Output Advanced NLP technologies Language & Knowledge Engineering Lab Example-based Machine Translation (EBMT) Divide the input sentence into a few parts Find similar expressions (= examples, TMs) from parallel corpus for each part Combine the examples to generate output translation Use any linguistic resources as much as possible Larger translation unit (larger example) is better Language & Knowledge Engineering Lab Flow of EBMT Language & Knowledge Engineering Lab Furthermore... Translation algorithm is implicit in EBMT → Probabilistic Model for EBMT [Aramaki et al., 05] Recently, the number of studies handling bigger unit is increasing Difference between SMT and EBMT is becoming smaller Most active study = Phrase-based SMT SMT and EBMT will be merged (?) Language & Knowledge Engineering Lab Outline I. II. History of Machine Translation Introduction of recent MT systems i. Statistic Machine Translation (SMT) ii. Example-based Machine Translation (EBMT) III. Related work for EBMT i. Logical Form ii. Efficient retrieval method IV. V. EBMT pursuing fully structural NLP Conclusion Language & Knowledge Engineering Lab Alignment method using Logical Form Logical Form [Arul et al., 01] – Represent the relations among the content words of a sentence by unordered graph Nodes are content words Branches indicate underlying semantic relations Spanish – Abstract language-particular aspects of a sentence Ex. word order, inflectional morphology, function words English Under Hyperlink Information, click the hyperlink address Language & Knowledge Engineering Lab Efficient Retrieval Method [Doi et al,. 04] Similarity between input and examples is calculated by word-based Edit Distance Finding suitable examples from a large parallel corpus takes a long time Challenged to resolve this problem by – Classifying sentences into groups according to the # of content words and function words – Compressing all sentences in a group into “directed word graph” – Searching best example in a group by A* algorithm Language & Knowledge Engineering Lab Outline I. II. History of Machine Translation Introduction of recent MT systems i. Statistic Machine Translation (SMT) ii. Example-based Machine Translation (EBMT) III. Related work for EBMT i. Logical Form ii. Efficient retrieval method IV. V. EBMT pursuing fully structural NLP Conclusion Language & Knowledge Engineering Lab Why EBMT? Pursuing structural NLP – Improvement of basic analyses leads to improvement of MT as an application of basic analyses – Feedback from application (MT) can be expected Adequacy of problem settings – Not a large corpus, but similar examples in relatively close domain Ex. Translation of -> version up of instruction manual related patent document ... Language & Knowledge Engineering Lab Overview of EBMT Input Parallel Corpus Alignment TMDB EBMT Translation Output Advanced NLP technologies Language & Knowledge Engineering Lab Alignment Japanese:交差点で、突然あの車が飛び出して来たのです。 English:The car came at me from the side at the intersection. 交差 点で、 突然 あの 車が 飛び出して 来た のです 。 the car came at me from the side at the intersection 1. Transform into dependency structure 2. Word-based alignment using bilingual lexicon 3. Extend the correspondence of phrases 4. Extract Translation Examples Language & Knowledge Engineering Lab Translation Translation Examples 交差 (cross) 交差点に入る時 私の信号は青でした。 点 で 、(point) at me 突然 (suddenly) from the side 飛び出して 来た のです 。 Input 交差 点に (enter) 時 (when) traffic The light 家に to remove (house) (point) my at the intersection (rush out) (cross) 入る came 入る (enter) 時 (when) was green when entering 脱ぐ (put off) when entering a house the intersection 私 の (my) 信号 は (signal) 青 私 の (my) サイン(signal) my signature (blue) でした 。 信号 は traffic (signal) (was) 青 The light (blue) でした 。 (was) Language Model was green Output My traffic light was green when entering the intersection. Language & Knowledge Engineering Lab IWSLT2005 IWSLT – International Workshop on Spoken Language Translation – Aiming at translation of ASR (Automatic Speech Recognition) Outline of campaign – Training set: parallel corpus including 20K sentences – Development set: two sets including 500 and 506 sentences – Test set: manual transcription and ASR output (500 sentences each) Language & Knowledge Engineering Lab Evaluation Results Manual Transcription (Supplied & Tools) Name BLUE Name NIST ATR-C3 0.4774 ATR-C3 8.1720 MICROSOFT 0.4057 MICROSOFT 8.0375 ATR-SLR 0.3884 TUV 7.8472 TUV 0.3718 NGKUT 7.7158 NGKUT 0.3418 ATR-SLR 4.3928 USC 0.2741 USC 2.9648 Language & Knowledge Engineering Lab Conclusion In this presentation … – – – – History of Machine Translation SMT and EBMT Two related work for EBMT Introduction of our EBMT system Future work – Improve our EBMT system Resolve paraphrase problem Apply anaphora resolution
© Copyright 2024 ExpyDoc