A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan; Fred Hollowood; Johann Roturier Outline 1 Introduction 2 Analysis on modifications made by SPE 3 Evaluation on Sentence Level 4 Conclusion Introduction • Rule-Based Machine Translation (RBMT) – Three Stages: • Analysis: analyze a source text into abstract lexical and structural representations • Transfer: convert the source language representations into target language representations • Generation: generate the target text Introduction • Rule-Based Machine Translation (RBMT) – Three Stages: • Analysis: analyze a source text into abstract lexical and structural representations • Transfer: convert the source language representations into target language representations • Generation: generate the target text • Statistical Machine Translation (SMT) – Two Stages: • Training: automatically learn translation and language knowledge from parallel corpus • Decoding: translate new sentences using the above learned knowledge Introduction • Rule-Based Machine Translation (RBMT) – Three Stages: • Analysis: analyze a source text into abstract lexical and structural representations • Transfer: convert the source language representations into target language representations • Generation: generate the target text • Statistical Machine Translation (SMT) – Two Stages: • Training: automatically learn translation and language knowledge from parallel corpus • Decoding: translate new sentences using the above learned knowledge • Post-Editing (PE) – Human post-editing – Automatic post-editing – Statistical post-editing (SPE) Introduction Flowchart of SPE • Statistical Post-editing (SPE) of Rule-Based Machine Translation (RBMT) Output • Knight & Chander (1994) • Simard et al. (2007a, 2007b) RBMT output Reference SMT Flowchart of RBMT Source RBMT Source RBMT Output 1 Human Post-editor Final output Output 1 SPE module Output 2 Human Post-editor Final output Introduction – Experimental setting RBMT output English Reference Translation Memory: 529,822 (ZH) and 143,742 (JA) SMT Source Moses RBMT Systran -UD: 8,832 entries (ZH) and 6,363 entries (JA) SPE module Output 1 Chinese (ZH); Japanese (JA) Output 2 Human Post-editor Final output Introduction – Evaluate SPE: Compare Output 2 and output 1 RBMT output Reference SMT Source RBMT Output 1 SPE module Output 2 Human Post-editor Final output Analysis of the Modifications Made by SPE Methodology • Pilot project – Random selection of 100 sentences for each language • Classify and Evaluate the changes – Classification(Vilar et al. 2006 ) • Alteration, Deletion, Addition of Content/Function words • Form of Tense/Voice/Imperative/Formality (Politeness) • Fixed expression • Reordering • Punctuation – Evaluation (Dugast et al. 2007 ) • Improvement • Degradation • Equivalent Analysis of the Modifications Made by SPE Quantitative Evaluation • Modifications distribution in Japanese and Chinese Improvement ZH JA ZH Equivalent JA ZH JA 137 45 19 40 28 25 38 45 6 9 17 30 0 9 0 2 0 1 51 57 4 5 12 16 4 0 3 2 2 0 Function words 12 1 8 2 15 1 Tense or Voice 6 3 0 0 3 5 Formality 0 1 1 0 0 0 Imperative 0 8 0 0 0 2 Fixed Expression 8 0 0 0 0 1 Word / Phrase Reordering 9 1 3 3 0 1 31 47 4 9 0 4 296 217 48 72 77 85 Alteration Deletion Addition Forms Punctuation Total Content words Degradation Function words Content words Function words Content words Analysis of the Modifications Made by SPE Qualitative Evaluation • Similarities Alteration of function words Source MT output SPE output To maintain … JA: 保守するため… 維持するには… Reverts to … ZH: 恢 复 对… 恢 复 到 ... Deletion of function words Source MT output SPE output the actions that you specify for that rule JA: あなたがその規則のために指定す る処理 そのルールに指定する処理 After you configure your … ZH: 在 您 配 置 您 的… 配 置… Source MT output SPE output MPE provides an option … JA: オプションを提供 します 。 オプションがあります . while the synchronization is in progress… ZH: , 当 同 步 进 展 中 时… 同 步 处 理…. Punctuation Analysis of the Modifications Made by SPE Qualitative Evaluation • Similarities Alteration of function words Source MT output SPE output To maintain … JA: 保守するため… 維持するには… Reverts to … ZH: 恢 复 对… 恢 复 到 ... Deletion of function words Source MT output SPE output the actions that you specify for that rule JA: あなたがその規則のために指定す る処理 そのルールに指定する処理 After you configure your … ZH: 在 您 配 置 您 的… 配 置… Source MT output SPE output MPE provides an option … JA: オプションを提供 します 。 オプションがあります . while the synchronization is in progress… ZH: , 当 同 步 进 展 中 时… 同 步 处 理…. Punctuation Analysis of the Modifications Made by SPE Qualitative Evaluation • Similarities Alteration of function words Source MT output SPE output To maintain … JA: 保守するため… 維持するには… Reverts to … ZH: 恢 复 对 恢复到 Deletion of function words Source MT output SPE output the actions that you specify for that rule JA: あなたがその規則のために指定す る処理 そのルールに指定する処理 After you configure your … ZH: 在 您 配 置 您 的 配置 Source MT output SPE output MPE provides an option … JA: オプションを提供 します 。 オプションがあります . while the synchronization is in progress… ZH: , 当 同 步 进 展 中 时… 同 步 处 理…. Punctuation Analysis of the Modifications Made by SPE Qualitative Evaluation • Differences Alteration of content words Source MT output SPE output console commands JA: コンソールは命じます console コマンド number JA: 番号 数 subdomains ZH: subdomains 子域 Addition of function words Source MT output SPE output A black dash indicates that it is disabled. ZH: 黑 色 破 折 号 表 明 它 禁 用。 黑 色 线 表 明 它 已 禁 用。 On the Spim tab… ZH: 在 Spim 选 项 卡… 在 Spim 选 项 卡 上… Analysis of the Modifications Made by SPE Qualitative Evaluation • Differences Alteration of content words Source MT output SPE output console commands JA: コンソールは命じます console コマンド number JA: 番号 数 subdomains ZH: subdomains 子域 Addition of function words Source MT output SPE output A black dash indicates that it is disabled. ZH: 黑 色 破 折 号 表 明 它 禁 用。 黑 色 线 表 明 它 已 禁 用。 On the Spim tab… ZH: 在 Spim 选 项 卡… 在 Spim 选 项 卡 上… Analysis of the Modifications Made by SPE Qualitative Evaluation • Differences Imperatives forms Source MT output SPE output (Imperative ending) JA: して下さい します Source MT output SPE output In general ZH: 一 般 情 况 下,… 通 常 情 况 下,… Source MT output SPE output These threats are then… ZH: 这 些 威 胁 然 后 … 然 后, 这 些 威 胁… Fixed expression Reordering Analysis of the Modifications Made by SPE Qualitative Evaluation • Differences Imperatives forms Source MT output SPE output (Imperative ending) JA: して下さい します Source MT output SPE output In general,… ZH: 一 般 情 况 下,… 通 常 情 况 下,… Source MT output SPE output These threats are then… ZH: 这 些 威 胁 然 后… 然 后, 这 些 威 胁… Fixed expression Reordering Analysis of the Modifications Made by SPE Qualitative Evaluation • Differences Imperatives forms Source MT output SPE output (Imperative ending) JA: して下さい します Source MT output SPE output In general,… ZH: 一 般 情 况 下,… 通 常 情 况 下,… Source MT output SPE output These threats are then… ZH: 这 些 威 胁 然 后… 然 后, 这 些 威 胁… Fixed expression Reordering Evaluation on Sentence Level • Methodology – – – – Same 100 segments Effect of SPE on Fluency, Adequacy and PE time Four evaluators per language Random distribution of MT output and SPE output Source_EN Output 1 Output 2 Fluency Adequacy Less-PE time Turns on or off the special meaning of metacharacters. オン/オフ回転メタ文字の 特別な意味。 有効または無効にメタ文字 の特別な意味します. 1/2/ E 1/2/ E 1/2/ E • Kappa scores (Inter-evaluator agreement level) – Japanese: moderate to substantial agreement – Chinese: generally fair agreement Criteria Chinese Japanese Fluency 0.276 0.598 Adequacy 0.288 0.582 Less PE time 0.284 0.624 Evaluation on Sentence Level Results and Analysis • Improvement by SPE: – Chinese ─ Fluency and Adequacy: ≈ 40%, PE time: ≈ 50% – Japanese ─ Fluency, Adequacy, PE time: ≈ 60% Language Criteria Chinese Fluency Adequacy Japanese Less PE Time Fluency Adequacy Less PE Time MT 12.75 15.50 15.00 14.50 8.00 9.75 SPE 37.75 38.00 48.25 59.25 61.50 62.50 Equal 49.50 46.50 36.75 26.05 30.50 27.75 Total 100 100 100 100 100 100 Conclusions • SPE generates more improvement than degradation • Three fold for Japanese; Six fold for Chinese • Linguistic changes vary between ZH and JA • SPE changes are generally limited to word level • SPE improves fluency, adequacy, and shortens PE time Questions? [email protected] [email protected]
© Copyright 2025 ExpyDoc