Spoken Language Parsing Using Phrase

Stat-XFER:
A General Framework for
Search-based Syntax-driven MT
Alon Lavie
Language Technologies Institute
Carnegie Mellon University
Joint work with:
Erik Peterson, Alok Parlikar, Vamshi Ambati, Christian Monson, Ari
Font Llitjos, Lori Levin, Jaime Carbonell – Carnegie Mellon University
Shuly Wintner, Danny Shacham, Nurit Melnik - University of Haifa
Roberto Aranovitch – University of Pittsburgh
Outline
•
•
•
•
•
•
Context and Rationale
CMU Statistical Transfer MT Framework
Broad Resource Scenario: Chinese-to-English
Low Resource Scenario: Hebrew-to-English
Open Research Challenges
Conclusions
February 18, 2008
CICLing-2008
2
Current State-of-the-Art in
Machine Translation
• MT underwent a major paradigm shift over the past 15
years:
– From manually crafted rule-based systems with manually
designed knowledge resources
– To search-based approaches founded on automatic
extraction of translation models/units from large sentenceparallel corpora
• Current Dominant Approach: Phrase-based Statistical
MT:
– Extract and statistically model large volumes of phrase-tophrase correspondences from automatically word-aligned
parallel corpora
– “Decode” new input by searching for the most likely
sequence of phrase matches, using a combination of
features, including a statistical Language Model for the
target language
February 18, 2008
CICLing-2008
3
Current State-of-the-art in
Machine Translation
• Phrase-based MT State-of-the-art:
– Requires minimally several million words of parallel
text for adequate training
– Mostly limited to language-pairs for which such data
exists: major European languages, Arabic, Chinese,
Japanese, a few others…
– Linguistically shallow and highly lexicalized models
result in weak generalization
– Best performance levels (BLEU=~0.6) on Arabic-toEnglish provide understandable but often still
ungrammatical or somewhat disfluent translations
– Ill suited for Hebrew and most of the world’s minor
and resource-poor languages
February 18, 2008
CICLing-2008
4
Rule-based vs. Statistical MT
• Traditional Rule-based MT:
– Expressive and linguistically-rich formalisms capable of
describing complex mappings between the two languages
– Accurate “clean” resources
– Everything constructed manually by experts
– Main challenge: obtaining broad coverage
• Phrase-based Statistical MT:
– Learn word and phrase correspondences automatically
from large volumes of parallel data
– Search-based “decoding” framework:
• Models propose many alternative translations
• Effective search algorithms find the “best” translation
– Main challenge: obtaining high translation accuracy
February 18, 2008
CICLing-2008
5
Research Goals
• Long-term research agenda (since 2000) focused on
developing a unified framework for MT that addresses
the core fundamental weaknesses of previous
approaches:
– Representation – explore richer formalisms that can
capture complex divergences between languages
– Ability to handle morphologically complex languages
– Methods for automatically acquiring MT resources from
available data and combining them with manual resources
– Ability to address both rich and poor resource scenarios
• Main research funding sources: NSF (AVENUE and
LETRAS projects) and DARPA (GALE)
February 18, 2008
CICLing-2008
6
CMU Statistical Transfer
(Stat-XFER) MT Approach
• Integrate the major strengths of rule-based and
statistical MT within a common framework:
– Linguistically rich formalism that can express complex and
abstract compositional transfer rules
– Rules can be written by human experts and also acquired
automatically from data
– Easy integration of morphological analyzers and
generators
– Word and syntactic-phrase correspondences can be
automatically acquired from parallel text
– Search-based decoding from statistical MT adapted to find
the best translation within the search space: multi-feature
scoring, beam-search, parameter optimization, etc.
– Framework suitable for both resource-rich and resourcepoor language scenarios
February 18, 2008
CICLing-2008
7
Stat-XFER Main Principles
• Framework: Statistical search-based approach with
syntactic translation transfer rules that can be acquired
from data but also developed and extended by experts
• Automatic Word and Phrase translation lexicon
acquisition from parallel data
• Transfer-rule Learning: apply ML-based methods to
automatically acquire syntactic transfer rules for
translation between the two languages
• Elicitation: use bilingual native informants to produce a
small high-quality word-aligned bilingual corpus of
translated phrases and sentences
• Rule Refinement: refine the acquired rules via a process
of interaction with bilingual informants
• XFER + Decoder:
– XFER engine produces a lattice of possible transferred
structures at all levels
– Decoder searches and selects the best scoring combination
February 18, 2008
CICLing-2008
8
Stat-XFER MT Approach
Interlingua
Semantic
Analysis
Syntactic
Parsing
Sentence
Planning
Transfer Rules
Text
Generation
Statistical-XFER
Source
(e.g. Quechua)
February 18, 2008
Direct: SMT, EBMT
CICLing-2008
Target
(e.g. English)
9
Source Input
‫בשורה הבאה‬
Transfer Rules
{NP1,3}
NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1]
((X3::Y1)
(X1::Y2)
((X1 def) = +)
((X1 status) =c absolute)
((X1 num) = (X3 num))
((X1 gen) = (X3 gen))
(X0 = X1))
Preprocessing
Morphology
Transfer
Engine
Language
Model +
Additional
Features
Translation Lexicon
N::N |: ["$WR"] -> ["BULL"]
((X1::Y1)
((X0 NUM) = s)
((Y0 lex) = "BULL"))
N::N |: ["$WRH"] -> ["LINE"]
((X1::Y1)
((X0 NUM) = s)
((Y0 lex) = "LINE"))
Decoder
Translation
Output Lattice
(0 1 "IN" @PREP)
(1 1 "THE" @DET)
(2 2 "LINE" @N)
(1 2 "THE LINE" @NP)
(0 2 "IN LINE" @PP)
(0 4 "IN THE NEXT LINE" @PP)
English Output
in the next line
Transfer Rule Formalism
;SL: the old man, TL: ha-ish ha-zaqen
Type information
Part-of-speech/constituent
information
Alignments
NP::NP
(
(X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2)
x-side constraints
((X1 AGR) = *3-SING)
((X1 DEF = *DEF)
((X3 AGR) = *3-SING)
((X3 COUNT) = +)
y-side constraints
xy-constraints,
e.g. ((Y1 AGR) = (X1 AGR))
February 18, 2008
[DET ADJ N] -> [DET N DET ADJ]
((Y1 DEF) = *DEF)
((Y3 DEF) = *DEF)
((Y2 AGR) = *3-SING)
((Y2 GENDER) = (Y4 GENDER))
)
CICLing-2008
11
Transfer Rule Formalism
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP
(
(X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2)
Value constraints
Agreement constraints
February 18, 2008
[DET ADJ N] -> [DET N DET ADJ]
((X1 AGR) = *3-SING)
((X1 DEF = *DEF)
((X3 AGR) = *3-SING)
((X3 COUNT) = +)
((Y1 DEF) = *DEF)
((Y3 DEF) = *DEF)
((Y2 AGR) = *3-SING)
((Y2 GENDER) = (Y4 GENDER))
)
CICLing-2008
12
Translation Lexicon: Examples
PRO::PRO |: ["ANI"] -> ["I"]
(
(X1::Y1)
((X0 per) = 1)
((X0 num) = s)
((X0 case) = nom)
)
N::N |: ["$&H"] -> ["HOUR"]
(
(X1::Y1)
((X0 NUM) = s)
((Y0 NUM) = s)
((Y0 lex) = "HOUR")
)
PRO::PRO |: ["ATH"] -> ["you"]
(
(X1::Y1)
((X0 per) = 2)
((X0 num) = s)
((X0 gen) = m)
((X0 case) = nom)
)
N::N |: ["$&H"] -> ["hours"]
(
(X1::Y1)
((Y0 NUM) = p)
((X0 NUM) = p)
((Y0 lex) = "HOUR")
)
February 18, 2008
CICLing-2008
13
Hebrew Transfer Grammar
Example Rules
{NP1,2}
;;SL: $MLH ADWMH
;;TL: A RED DRESS
{NP1,3}
;;SL: H $MLWT H ADWMWT
;;TL: THE RED DRESSES
NP1::NP1 [NP1 ADJ] -> [ADJ NP1]
(
(X2::Y1)
(X1::Y2)
((X1 def) = -)
((X1 status) =c absolute)
((X1 num) = (X2 num))
((X1 gen) = (X2 gen))
(X0 = X1)
)
NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1]
(
(X3::Y1)
(X1::Y2)
((X1 def) = +)
((X1 status) =c absolute)
((X1 num) = (X3 num))
((X1 gen) = (X3 gen))
(X0 = X1)
)
February 18, 2008
CICLing-2008
14
The Transfer Engine
• Input: source-language input sentence, or sourcelanguage confusion network
• Output: lattice representing collection of translation
fragments at all levels supported by transfer rules
• Basic Algorithm: “bottom-up” integrated “parsingtransfer-generation” guided by the transfer rules
– Start with translations of individual words and phrases
from translation lexicon
– Create translations of larger constituents by applying
applicable transfer rules to previously created lattice
entries
– Beam-search controls the exponential combinatorics of the
search-space, using multiple scoring features
February 18, 2008
CICLing-2008
15
The Transfer Engine
• Some Unique Features:
– Works with either learned or manually-developed
transfer grammars
– Handles rules with or without unification constraints
– Supports interfacing with servers for morphological
analysis and generation
– Can handle ambiguous source-word analyses and/or
SL segmentations represented in the form of lattice
structures
February 18, 2008
CICLing-2008
16
XFER Output Lattice
(28
(29
(29
(29
(30
(30
(30
(30
(30
(30
(30
28
29
29
29
30
30
30
30
30
30
30
"AND" -5.6988 "W" "(CONJ,0 'AND')")
"SINCE" -8.20817 "MAZ " "(ADVP,0 (ADV,5 'SINCE')) ")
"SINCE THEN" -12.0165 "MAZ " "(ADVP,0 (ADV,6 'SINCE THEN')) ")
"EVER SINCE" -12.5564 "MAZ " "(ADVP,0 (ADV,4 'EVER SINCE')) ")
"WORKED" -10.9913 "&BD " "(VERB,0 (V,11 'WORKED')) ")
"FUNCTIONED" -16.0023 "&BD " "(VERB,0 (V,10 'FUNCTIONED')) ")
"WORSHIPPED" -17.3393 "&BD " "(VERB,0 (V,12 'WORSHIPPED')) ")
"SERVED" -11.5161 "&BD " "(VERB,0 (V,14 'SERVED')) ")
"SLAVE" -13.9523 "&BD " "(NP0,0 (N,34 'SLAVE')) ")
"BONDSMAN" -18.0325 "&BD " "(NP0,0 (N,36 'BONDSMAN')) ")
"A SLAVE" -16.8671 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0
(N,34 'SLAVE')) ) ) ) ")
(30 30 "A BONDSMAN" -21.0649 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0
(NP0,0 (N,36 'BONDSMAN')) ) ) ) ")
February 18, 2008
CICLing-2008
17
The Lattice Decoder
• Simple Stack Decoder, similar in principle to simple
Statistical MT decoders
• Searches for best-scoring path of non-overlapping
lattice arcs
• No reordering during decoding
• Scoring based on log-linear combination of scoring
features, with weights trained using Minimum Error Rate
Training (MERT)
• Scoring components:
–
–
–
–
Statistical Language Model
Rule Scores
Lexical Probabilities
Fragmentation: how many arcs to cover the entire
translation?
– Length Penalty: how far from expected target length?
February 18, 2008
CICLing-2008
18
XFER Lattice Decoder
00
ON THE FOURTH DAY THE LION ATE THE RABBIT TO A MORNING MEAL
Overall: -8.18323, Prob: -94.382, Rules: 0, Frag: 0.153846, Length: 0,
Words: 13,13
235 < 0 8 -19.7602: B H IWM RBI&I (PP,0 (PREP,3 'ON')(NP,2 (LITERAL 'THE')
(NP2,0 (NP1,1 (ADJ,2 (QUANT,0 'FOURTH'))(NP1,0 (NP0,1 (N,6 'DAY')))))))>
918 < 8 14 -46.2973: H ARIH AKL AT H $PN (S,2 (NP,2 (LITERAL 'THE') (NP2,0
(NP1,0 (NP0,1 (N,17 'LION')))))(VERB,0 (V,0 'ATE'))(NP,100
(NP,2 (LITERAL 'THE') (NP2,0 (NP1,0 (NP0,1 (N,24 'RABBIT')))))))>
584 < 14 17 -30.6607: L ARWXH BWQR (PP,0 (PREP,6 'TO')(NP,1 (LITERAL 'A')
(NP2,0 (NP1,0 (NNP,3 (NP0,0 (N,32 'MORNING'))(NP0,0 (N,27 'MEAL')))))))>
February 18, 2008
CICLing-2008
19
Stat-XFER MT Systems
• General Stat-XFER framework under development for
past seven years
• Systems so far:
–
–
–
–
–
–
Chinese-to-English
Hebrew-to-English
Urdu-to-English
Hindi-to-English
Dutch-to-English
Mapudungun-to-Spanish
–
–
–
–
–
Brazilian Portuguese-to-English
Native-Brazilian languages to Brazilian Portuguese
Hebrew-to-Arabic
Quechua-to-Spanish
Turkish-to-English
• In progress or planned:
February 18, 2008
CICLing-2008
20
MT Resource Acquisition in
Resource-rich Scenarios
• Scenario: Significant amounts of parallel-text at
sentence-level are available
– Parallel sentences can be word-aligned and parsed (at
least on one side, ideally on both sides)
• Goal: Acquire both broad-coverage translation lexicons
and transfer rule grammars automatically from the data
• Syntax-based translation lexicons:
– Broad-coverage constituent-level translation equivalents at
all levels of granularity
– Can serve as the elementary building blocks for transfer
trees constructed at runtime using the transfer rules
February 18, 2008
CICLing-2008
21
Acquisition Process
•
Automatic Process for Extracting Syntax-driven Rules
and Lexicons from sentence-parallel data:
1.
2.
3.
4.
5.
6.
Word-align the parallel corpus (GIZA++)
Parse the sentences independently for both languages
Run our new PFA Constituent Aligner over the parsed
sentence pairs
Extract all aligned constituents from the parallel trees
Extract all derived synchronous transfer rules from
the constituent-aligned parallel trees
Construct a “data-base” of all extracted parallel
constituents and synchronous rules with their
frequencies and model them statistically (assign them
relative-likelihood probabilities)
February 18, 2008
CICLing-2008
22
PFA Constituent Node Aligner
• Input: a bilingual pair of parsed and word-aligned
sentences
• Goal: find all sub-sentential constituent alignments
between the two trees which are translation equivalents
of each other
• Equivalence Constraint: a pair of constituents <S,T>
are considered translation equivalents if:
– All words in yield of <S> are aligned only to words in yield of <T>
(and vice-versa)
– If <S> has a sub-constituent <S1> that is aligned to <T1>, then
<T1> must be a sub-constituent of <T> (and vice-versa)
• Algorithm is a bottom-up process starting from wordlevel, marking nodes that satisfy the constraints
February 18, 2008
CICLing-2008
23
PFA Node
Alignment
Algorithm
Example
•Words don’t have
to align one-to-one
•Constituent labels
can be different in
each language
•Tree Structures
can be highly
divergent
PFA Node
Alignment
Algorithm
Example
•Aligner uses a
clever arithmetic
manipulation to
enforce
equivalence
constraints
•Resulting aligned
nodes are
highlighted in figure
PFA Node
Alignment
Algorithm
Example
Extraction of Phrases:
•Get the Yields of the
aligned nodes and add
them to a phrase table
tagged with syntactic
categories on both
source and target sides
•Example:
NP # NP ::
澳洲 # Australia
PFA Node
Alignment
Algorithm
Example
All Phrases from this tree pair:
1. IP # S :: 澳洲 是 与 北韩 有 邦交 的 少数 国家 之一 。 # Australia is one of the few countries
that have diplomatic relations with North Korea .
2. VP # VP :: 是 与 北韩 有 邦交 的 少数 国家 之一 # is one of the few countries that have
diplomatic relations with North Korea
3. NP # NP :: 与 北韩 有 邦交 的 少数 国家 之一 # one of the few countries that have diplomatic
relations with North Korea
4. VP # VP :: 与 北韩 有 邦交 # have diplomatic relations with North Korea
5. NP # NP :: 邦交 # diplomatic relations
6. NP # NP :: 北韩 # North Korea
7. NP # NP :: 澳洲 # Australia
PFA Constituent Node
Alignment Performance
• Evaluation Data: Chinese-English Treebank
– Parallel Chinese-English Treebank with manual wordalignments
– 3342 Sentence Pairs
• Created a “Gold Standard” constituent alignments using
the manual word-alignments and treebank trees
– Node Alignments: 39874 (About 12/tree pair)
– NP to NP Alignments: 5427
• Manual inspection confirmed that the constituent
alignments are extremely accurate (>95%)
• Evaluation: Run PFA Aligner with automatic word
alignments on same data and compare with the “gold
Standard” alignments
February 18, 2008
CICLing-2008
28
PFA Constituent Node
Alignment Performance
•Viterbi word alignments from Chinese-English and reverse directions were
merged using different algorithms
•Tested the performance of Node-Alignment with each resulting alignment
Viterbi
Combination
Precision
Recall
F-Measure
Intersection
0.6278
0.5525
0.5877
Union
0.8054
0.2778
0.4131
Sym-1 (Thot Toolkit)
0.7182
0.4525
0.5552
Sym-2 (Thot Toolkit)
0.7170
0.4602
0.5606
Grow-Diag-Final
0.4040
0.2500
0.3089
Transfer Rule Learning
• Input: Constituent-aligned parallel trees
• Idea: Aligned nodes act as possible decomposition
points of the parallel trees
– The sub-trees of any aligned pair of nodes can be broken
apart at any lower-level aligned nodes, creating an
inventory of “treelet” correspondences
– Synchronous “treelets” can be converted into synchronous
rules
• Algorithm:
– Find all possible treelet decompositions from the node
aligned trees
– “Flatten” the treelets into synchronous CFG rules
February 18, 2008
CICLing-2008
30
Rule Extraction
Algorithm
Sub-Treelet extraction:
Extract Sub-tree segments including
synchronous alignment information in
the target tree. All the sub-trees and
the super-tree are extracted.
Rule Extraction
Algorithm
Flat Rule Creation:
Each of the treelets pairs is flattened
to create a Rule in the ‘Avenue
Formalism’ –
Four major parts to the rule:
1. Type of the rule: Source and
Target side type information
2. Constituent sequence of the
synchronous flat rule
3. Alignment information of the
constituents
4. Constraints in the rule
(Currently not extracted)
Rule Extraction
Algorithm
Flat Rule Creation:
Sample rule:
IP::S [ NP VP .] -> [NP VP .]
(
;; Alignments
(X1::Y1)
(X2::Y2)
;;Constraints
)
Rule Extraction
Algorithm
Flat Rule Creation:
Sample rule:
NP::NP [VP 北 CD 有 邦交 ] -> [one
of the CD countries that VP]
(
;; Alignments
(X1::Y7)
(X3::Y4)
)
Note:
1. Any one-to-one aligned words
are elevated to Part-Of-Speech
in flat rule.
2. Any non-aligned words on
either source or target side
remain lexicalized
Rule Extraction
Algorithm
All rules extracted:
VP::VP [VC NP] -> [VBZ NP]
(
(*score* 0.5)
;; Alignments
(X1::Y1)
(X2::Y2)
)
All rules extracted:
NP::NP [VP 北 CD 有 邦交 ] -> [one of the CD countries that VP]
(
(*score* 0.5)
;; Alignments
(X1::Y7)
(X3::Y4)
)
IP::S [ NP VP ] -> [NP VP ]
(
(*score* 0.5)
;; Alignments
(X1::Y1)
(X2::Y2)
)
NP::NP [ “北韩”] -> [“North” “Korea”]
(
;Many to one alignment is a phrase
)
VP::VP [VC NP] -> [VBZ NP]
(
(*score* 0.5)
;; Alignments
(X1::Y1)
(X2::Y2)
)
NP::NP [NR] -> [NNP]
(
(*score* 0.5)
;; Alignments
(X1::Y1)
(X2::Y2)
)
VP::VP [北 NP VE NP] -> [ VBP NP with NP]
(
(*score* 0.5)
;; Alignments
(X2::Y4)
(X3::Y1)
(X4::Y2)
)
Chinese-English System
• Developed over past year under DARPA/GALE
funding (within IBM-led “Rosetta” team)
• Participated in recent NIST MT-08 Evaluation
• Large-scale broad-coverage system
• Integrates large manual resources with
automatically extracted resources
• Current performance-level is still inferior to
state-of-the-art phrase-based systems
February 18, 2008
CICLing-2008
36
Chinese-English System
• Lexical Resources:
– Manual Lexicons (base forms):
• LDC, ADSO, Wiki
• Total number of entries: 1.07 million
– Automatically acquired from parallel data:
•
•
•
•
Approx 5 million sentences LDC/GALE data
Filtered down to phrases < 10 words in length
Full formed
Total number of entries: 2.67 million
February 18, 2008
CICLing-2008
37
Chinese-English System
• Transfer Rules:
– 61 manually developed transfer rules
– High-accuracy rules extracted from manually wordaligned parallel data
Corpus
Size (sens)
Rules with Rules
Structure (count>=2)
Complete
Lexical rules
Parallel Treebank (3K)
3,343
45,266
1,962
11,521
993 sentences
993
12,661
331
2,199
Parallel Treebank (7K)
6,541
41,998
1,756
16,081
Merged Corpus set
10K
94,117
3160
29,340
February 18, 2008
CICLing-2008
38
Translation Example
•
•
•
SrcSent 3
澳洲是与北韩有邦交的少数国家之一。
Gloss:
Australia is with north korea have diplomatic relations DE few country world
Reference: Australia is one of the few countries that have diplomatic relations
with North Korea.
•
Translation:
Australia is one of the few countries that has diplomatic
relations with north korea .
Overall: -5.77439, Prob: -2.58631, Rules: -0.66874, TransSGT: -2.58646,
TransTGS: -1.52858, Frag: -0.0413927, Length: -0.127525, Words: 11,15
( 0 10 "Australia is one of the few countries that has diplomatic relations with
north korea" -5.66505 "澳洲 是 与 北韩 有 邦交 的 少数 国家 之一 " "(S1,1124731
(S,1157857 (NP,2 (NB,1 (LDC_N,1267 'Australia') ) ) (VP,1046077 (MISC_V,1
'is') (NP,1077875 (LITERAL 'one') (LITERAL 'of') (NP,1045537 (NP,1017929
(NP,1 (LITERAL 'the') (NUMNB,2 (LDC_NUM,420 'few') (NB,1 (WIKI_N,62230
'countries') ) ) ) (LITERAL 'that') (VP,1021811 (LITERAL 'has')
(FBIS_NP,11916 'diplomatic relations') ) ) (FBIS_PP,84791 'with north korea')
) ) ) ) ) ")
( 10 11 "." -11.9549 "。" "(MISC_PUNC,20 '.')")
•
•
•
February 5, 2008
39
CMU MT Update
for Joe Olive
Example: Syntactic Lexical Phrases
•
•
•
•
(LDC_N,1267 'Australia')
(WIKI_N,62230 'countries')
(FBIS_NP,11916 'diplomatic relations')
(FBIS_PP,84791 'with north korea')
February 5, 2008
40
CMU MT Update
for Joe Olive
Example: XFER Rules
;;SL::(2,4) 对 台 贸易
;;TL::(3,5) trade to taiwan
;;Score::22
{NP,1045537}
NP::NP [PP NP ] -> [NP PP ]
((*score* 0.916666666666667)
(X2::Y1)
(X1::Y2))
;;SL::(2,7) 直接 提到 伟 哥 的 广告
;;TL::(1,7) commercials that directly mention the name viagra
;;Score::5
{NP,1017929}
NP::NP [VP "的" NP ] -> [NP "that" VP ]
((*score* 0.111111111111111)
(X3::Y1)
(X1::Y3))
;;SL::(4,14) 有 一 至 多 个 高 新 技术 项目 或 产品
;;TL::(3,14) has one or more new , high level technology projects or products
;;Score::4
{VP,1021811}
VP::VP ["有" NP ] -> ["has" NP ]
((*score* 0.1)
(X2::Y2))
February 5, 2008
41
CMU MT Update
for Joe Olive
MT Resource Acquisition in
Resource-poor Scenarios
• Scenario: Very limited amounts of parallel-text at
sentence-level are available
– Significant amounts of monolingual text available for one
of the two languages (i.e. English, Spanish)
• Approach:
– Manually acquire and/or construct translation lexicons
– Transfer rule grammars can be manually developed and/or
automatically acquired from an elicitation corpus
• Strategy:
– Learn transfer rules by syntax projection from major language to
minor language
– Build MT system to translate from minor language to major
language
February 18, 2008
CICLing-2008
42
Learning Transfer-Rules for
Languages with Limited Resources
• Rationale:
– Large bilingual corpora not available
– Bilingual native informant(s) can translate and align a
small pre-designed elicitation corpus, using elicitation tool
– Elicitation corpus designed to be typologically
comprehensive and compositional
– Transfer-rule engine and new learning approach support
acquisition of generalized transfer-rules from the data
February 18, 2008
CICLing-2008
43
Elicitation Tool:
English-Hindi Example
February 18, 2008
CICLing-2008
44
Elicitation Tool:
English-Arabic Example
February 18, 2008
CICLing-2008
45
Elicitation Tool:
Spanish-Mapudungun Example
February 18, 2008
CICLing-2008
46
Hebrew-to-English MT Prototype
• Initial prototype developed within a two month
intensive effort
• Accomplished:
–
–
–
–
–
–
–
Adapted available morphological analyzer
Constructed a preliminary translation lexicon
Translated and aligned Elicitation Corpus
Learned XFER rules
Developed (small) manual XFER grammar
System debugging and development
Evaluated performance on unseen test data using
automatic evaluation metrics
February 18, 2008
CICLing-2008
47
Challenges for Hebrew MT
• Puacity in existing language resources for
Hebrew
– No publicly available broad coverage morphological
analyzer
– No publicly available bilingual lexicons or dictionaries
– No POS-tagged corpus or parse tree-bank corpus for
Hebrew
– No large Hebrew/English parallel corpus
• Scenario well suited for Stat-XFER framework
for languages with limited resources
February 18, 2008
CICLing-2008
48
Modern Hebrew Spelling
• Two main spelling variants
– “KTIV XASER” (difficient): spelling with the vowel
diacritics, and consonant words when the diacritics
are removed
– “KTIV MALEH” (full): words with I/O/U vowels are
written with long vowels which include a letter
• KTIV MALEH is predominant, but not strictly
adhered to even in newspapers and official
publications  inconsistent spelling
• Example:
– niqud (spelling): NIQWD, NQWD, NQD
– When written as NQD, could also be niqed, naqed,
nuqad
February 18, 2008
CICLing-2008
49
Morphological Analyzer
• We use a publicly available morphological
analyzer distributed by the Technion’s
Knowledge Center, adapted for our system
• Coverage is reasonable (for nouns, verbs and
adjectives)
• Produces all analyses or a disambiguated
analysis for each word
• Output format includes lexeme (base form),
POS, morphological features
• Output was adapted to our representation
needs (POS and feature mappings)
February 18, 2008
CICLing-2008
50
Morphology Example
• Input word: B$WRH
0
1
2
3
4
|--------B$WRH--------|
|-----B-----|$WR|--H--|
|--B--|-H--|--$WRH---|
February 18, 2008
CICLing-2008
51
Morphology Example
Y0: ((SPANSTART 0)
(SPANEND 4)
(LEX B$WRH)
(POS N)
(GEN F)
(NUM S)
(STATUS ABSOLUTE))
Y1: ((SPANSTART 0)
(SPANEND 2)
(LEX B)
(POS PREP))
Y2: ((SPANSTART 1)
(SPANEND 3)
(LEX $WR)
(POS N)
(GEN M)
(NUM S)
(STATUS ABSOLUTE))
Y3: ((SPANSTART 3)
(SPANEND 4)
(LEX $LH)
(POS POSS))
Y4: ((SPANSTART 0)
(SPANEND 1)
(LEX B)
(POS PREP))
Y5: ((SPANSTART 1)
(SPANEND 2)
(LEX H)
(POS DET))
Y6: ((SPANSTART 2)
(SPANEND 4)
(LEX $WRH)
(POS N)
(GEN F)
(NUM S)
(STATUS ABSOLUTE))
Y7: ((SPANSTART 0)
(SPANEND 4)
(LEX B$WRH)
(POS LEX))
February 18, 2008
CICLing-2008
52
Translation Lexicon
• Constructed our own Hebrew-to-English lexicon, based
primarily on existing “Dahan” H-to-E and E-to-H
dictionary made available to us, augmented by other
public sources
• Coverage is not great but not bad as a start
– Dahan H-to-E is about 15K translation pairs
– Dahan E-to-H is about 7K translation pairs
• Base forms, POS information on both sides
• Converted Dahan into our representation, added entries
for missing closed-class entries (pronouns, prepositions,
etc.)
• Had to deal with spelling conventions
• Recently augmented with ~50K translation pairs
extracted from Wikipedia (mostly proper names and
named entities)
February 18, 2008
CICLing-2008
53
Manual Transfer Grammar
(human-developed)
• Initially developed by Alon in a couple of days,
extended and revised by Nurit over time
• Current grammar has 36 rules:
–
–
–
–
21 NP rules
one PP rule
6 verb complexes and VP rules
8 higher-phrase and sentence-level rules
• Captures the most common (mostly local)
structural differences between Hebrew and
English
February 18, 2008
CICLing-2008
54
Transfer Grammar
Example Rules
{NP1,2}
;;SL: $MLH ADWMH
;;TL: A RED DRESS
{NP1,3}
;;SL: H $MLWT H ADWMWT
;;TL: THE RED DRESSES
NP1::NP1 [NP1 ADJ] -> [ADJ NP1]
(
(X2::Y1)
(X1::Y2)
((X1 def) = -)
((X1 status) =c absolute)
((X1 num) = (X2 num))
((X1 gen) = (X2 gen))
(X0 = X1)
)
NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1]
(
(X3::Y1)
(X1::Y2)
((X1 def) = +)
((X1 status) =c absolute)
((X1 num) = (X3 num))
((X1 gen) = (X3 gen))
(X0 = X1)
)
February 18, 2008
CICLing-2008
55
Example Translation
• Input:
– ‫לאחר דיונים רבים החליטה הממשלה לערוך משאל עם בנושא הנסיגה‬
– Gloss: After debates many decided the government to hold
referendum in issue the withdrawal
• Output:
– AFTER MANY DEBATES THE GOVERNMENT DECIDED
TO HOLD A REFERENDUM ON THE ISSUE OF THE
WITHDRAWAL
February 18, 2008
CICLing-2008
56
Noun Phrases – Construct State
‫החלטת הנשיא הראשון‬
HXL@T
[HNSIA
HRA$WN]
decision.3SF-CS the-president.3SM the-first.3SM
THE DECISION OF THE FIRST PRESIDENT
‫החלטת הנשיא הראשונה‬
[HXL@T
HNSIA]
decision.3SF-CS the-president.3SM
HRA$WNH
the-first.3SF
THE FIRST DECISION OF THE PRESIDENT
February 18, 2008
CICLing-2008
57
Noun Phrases - Possessives
‫הנשיא הכריז שהמשימה הראשונה שלו תהיה למצוא פתרון לסכסוך באזורנו‬
HNSIA
HKRIZ
$HM$IMH
HRA$WNH $LW
THIH
the-president announced that-the-task.3SF the-first.3SF of-him will.3SF
LMCWA PTRWN LSKSWK
to-find solution to-the-conflict
BAZWRNW
in-region-POSS.1P
Without transfer grammar:
THE PRESIDENT ANNOUNCED THAT THE TASK THE BEST OF HIM
WILL BE TO FIND SOLUTION TO THE CONFLICT IN REGION OUR
With transfer grammar:
THE PRESIDENT ANNOUNCED THAT HIS FIRST TASK WILL BE
TO FIND A SOLUTION TO THE CONFLICT IN OUR REGION
February 18, 2008
CICLing-2008
58
Subject-Verb Inversion
‫אתמול הודיעה הממשלה שתערכנה בחירות בחודש הבא‬
ATMWL
HWDI&H
HMM$LH
yesterday announced.3SF the-government.3SF
$T&RKNH
BXIRWT
BXWD$
HBA
that-will-be-held.3PF elections.3PF in-the-month the-next
Without transfer grammar:
YESTERDAY ANNOUNCED THE GOVERNMENT THAT WILL RESPECT
OF THE FREEDOM OF THE MONTH THE NEXT
With transfer grammar:
YESTERDAY THE GOVERNMENT ANNOUNCED THAT ELECTIONS
WILL ASSUME IN THE NEXT MONTH
February 18, 2008
CICLing-2008
59
Subject-Verb Inversion
‫לפני כמה שבועות הודיעה הנהלת המלון שהמלון יסגר בסוף השנה‬
LPNI
before
KMH $BW&WT HWDI&H
HNHLT
HMLWN
several weeks
announced.3SF management.3SF.CS the-hotel
$HMLWN
ISGR
BSWF
H$NH
that-the-hotel.3SM will-be-closed.3SM at-end.3SM.CS the-year
Without transfer grammar:
IN FRONT OF A FEW WEEKS ANNOUNCED ADMINISTRATION THE
HOTEL THAT THE HOTEL WILL CLOSE AT THE END THIS YEAR
With transfer grammar:
SEVERAL WEEKS AGO THE MANAGEMENT OF THE HOTEL ANNOUNCED
THAT THE HOTEL WILL CLOSE AT THE END OF THE YEAR
February 18, 2008
CICLing-2008
60
Evaluation Results
• Test set of 62 sentences from Haaretz
newspaper, 2 reference translations
System
BLEU
NIST
P
R
METEOR
No Gram
0.0616 3.4109 0.4090
0.4427
0.3298
Learned
0.0774 3.5451 0.4189
0.4488
0.3478
Manual
0.1026 3.7789 0.4334
0.4474
0.3617
February 18, 2008
CICLing-2008
61
Open Research Questions
• Our large-scale Chinese-English system is still
significantly behind phrase-based SMT. Why?
–
–
–
–
–
Weaker decoder?
Feature set is not sufficiently discriminant?
Problems with the parsers for the two sides?
Syntactic constituents don’t provide sufficient coverage?
Bugs and deficiencies in the underlying algorithms?
• The ISI experience indicates that it may take a couple
of years to catch up with and surpass the phrase-based
systems
• Significant engineering issues to improve speed and
efficient runtime processing and improved search
February 18, 2008
CICLing-2008
62
Open Research Questions
• Immediate Research Issues:
– Rule Learning:
• Study effects of learning rules from manually vs
automatically word aligned data
• Study effects of parser accuracy on learned rules
• Effective discriminant methods for modeling rule scores
• Rule filtering strategies
– Syntax-based LMs:
• Our translations come out with a syntax-tree attached
to them
• Add a syntax-based LM feature that can discriminate
between good and bad trees
February 18, 2008
CICLing-2008
63
Conclusions
• Stat-XFER is a promising general MT framework,
suitable to a variety of MT scenarios and languages
• Provides a complete solution for building end-to-end MT
systems from parallel data, akin to phrase-based SMT
systems (training, tuning, runtime system)
• No open-source publically available toolkits (yet), but
we welcome further collaboration activities
• Complex but highly interesting set of open research
issues
• Prediction: this is the future direction of MT!
February 18, 2008
CICLing-2008
64
Questions?
February 18, 2008
CICLing-2008
65
Current and Future Work
• Issues specific to the Hebrew-to-English system:
– Coverage: further improvements in the translation lexicon
and morphological analyzer
– Manual Grammar development
– Acquiring/training of word-to-word translation probabilities
– Acquiring/training of a Hebrew language model at a postmorphology level that can help with disambiguation
• General Issues related to XFER framework:
–
–
–
–
Discriminative Language Modeling for MT
Effective models for assigning scores to transfer rules
Improved grammar learning
Merging/integration of manual and acquired grammars
February 18, 2008
CICLing-2008
66
Conclusions
• Test case for the CMU XFER framework for
rapid MT prototyping
• Preliminary system was a two-month, three
person effort – we were quite happy with the
outcome
• Core concept of XFER + Decoding is very
powerful and promising for MT
• We experienced the main bottlenecks of
knowledge acquisition for MT: morphology,
translation lexicons, grammar...
February 18, 2008
CICLing-2008
67
Sample Output (dev-data)
maxwell anurpung comes from ghana for israel four years
ago and since worked in cleaning in hotels in eilat
a few weeks ago announced if management club hotel that
for him to leave israel according to the government
instructions and immigration police
in a letter in broken english which spread among the
foreign workers thanks to them hotel for their hard work
and announced that will purchase for hm flight tickets
for their countries from their money
February 18, 2008
CICLing-2008
68
Some Syntactic Challenges for
Hebrew-English MT
• Possessor Dative Construction
hitkalkela
la-nu ha-mexonit
broke-down to-us the-car
Our car broke down.
• Anaphor resolution
ha-memSala arxa et
yeSivata ha-riSona
the-government held ACC her-meeting the-first
February 18, 2008
CICLing-2008
The government held
its first meeting.
69
Input
‫פגישתם‬
Morph. Analysis
Transfer Rules
(
{NP0,2}
NP0::NP0 [N PRO] -> [N]
(
(X1::Y1)
((X2 case) = possessive)
((X0 possessor) = X2)
((X0 def) = +)
((Y1 num) = (X1 num))
(X0 = X1)
(Y0 = X0)
)
( SPANSTART
0)
( SPANEND
1)
( SCORE
1)
( LEX
PGI$H )
( POS
N)
( GEN
feminine )
( NUM
singular )
( STATUS
absolute )
PGI$TM
pgiSat-am
meeting.3SF-POSS.3PM
Output
THEIR MEETING
)
(
( SPANSTART
1)
( SPANEND
2)
( SCORE
1)
( LEX
*PRO* )
( POS
PRO )
( TRANS
*PRO* )
( GEN
masculine )
( NUM
plural )
( PER
3)
( CASE
possessive )
February 18, 2008
)
CICLing-2008
{NP,3}
NP::NP [NP2] -> [PRO NP2]
(
(X1::Y2)
((X1 possessor) =c *DEFINED*)
((Y1 case) = (X1 possessor case))
((Y1 per) = (X1 possessor person))
((Y1 num) = (X1 possessor num))
((Y1 gen) = (X1 possessor gen))
(X0 = X1)
(Y0 = Y2)
70
)
Morphological Processing
• Split attached prefixes and suffixes into
separate words for translation
• Produce feature-structures as output
• Convert feature-value codes to our
conventions
• “All analyses mode”: all possible analyses for
each input word returned, represented in the
form of a input lattice
• Analyzer installed as a server integrated with
input pre-processer
February 18, 2008
CICLing-2008
71
Challenges and Future Directions
• Our approach for learning transfer rules is
applicable to the large parallel data scenario,
subject to solutions for several big challenges:
– No elicitation corpus  break-down parallel
sentences into reasonable learning examples
– Working with less reliable automatic word alignments
rather than manual alignments
– Effective use of reliable parse structures for ONE
language (i.e. English) and automatic word
alignments in order to decompose the translation of
a sentence into several compositional rules.
– Effective scoring of resulting very large transfer
grammars, and scaled up transfer + decoding
February 18, 2008
CICLing-2008
72
Challenges and Future Directions
• Automatic Transfer Rule Learning:
– Learning mappings for non-compositional structures
– Effective models for rule scoring for
• Decoding: using scores at runtime
• Pruning the large collections of learned rules
– Learning Unification Constraints
– In the absence of morphology or POS annotated
lexica
• Integrated Xfer Engine and Decoder
– Improved models for scoring tree-to-tree mappings,
integration with LM and other knowledge sources in
the course of the search
February 18, 2008
CICLing-2008
73
Hebrew Text Encoding Issues
• Input texts are (most commonly) in standard
Windows encoding for Hebrew, but also
unicode (UTF-8) and others…
• Morphology analyzer and other resources
already set to work in a romanized “ascii-like”
representation
•  Converter script converts the input into the
romanized representation – 1-to-1 mapping!
• All further processing is done in the romanized
representation
• Lexicon and grammar rules are also converted
into romanized representation
February 18, 2008
CICLing-2008
74
XFER + Decoder
• XFER engine produces a lattice of all possible
transferred fragments
• Decoder searches for and selects the best
scoring sequence of fragments as a final
translation output
• Main advantages:
– Very high robustness
• always some translation output
• no transfer grammar  word-to-word translation
– Scoring can take into account word-to-word
translation probabilities, transfer rule scores, target
statistical language model
– Effective framework for late-stage disambiguation
• Main Difficulty: lattice size too big  pruning
February 18, 2008
CICLing-2008
75
Modern Hebrew
• Native language of about 3-4 Million in Israel
• Semitic language, closely related to Arabic and
with similar linguistic properties
– Root+Pattern word formation system
– Rich verb and noun morphology
– Particles attach as prefixed to the following word:
definite article (H), prepositions (B,K,L,M),
coordinating conjuction (W), relativizers ($,K$)…
• Unique alphabet and Writing System
– 22 letters represent (mostly) consonants
– Vowels represented (mostly) by diacritics
– Modern texts omit the diacritic vowels, thus
additional level of ambiguity: “bare” word  word
– Example: MHGR  mehager, m+hagar, m+h+ger
February 18, 2008
CICLing-2008
76
The Transfer Engine
Analysis
Transfer
Source text is parsed
A target language tree is
into its grammatical
created by reordering,
structure. Determines insertion, and deletion.
transfer application
ordering.
S
Example:
NP VP
他 看 书。(he read
book)
N
he
S
NP VP
N
V NP
他
看书
February 18, 2008
V NP
read DET N
a
book
Article “a” is inserted
into object NP. Source
words translated with
transfer lexicon.
CICLing-2008
Generation
Target language
constraints are
checked and final
translation
produced.
E.g. “reads” is
chosen over “read”
to agree with “he”.
Final translation:
“He reads a book”
77
Elicitation Tool:
English-Chinese Example
February 18, 2008
CICLing-2008
78
Elicitation Tool:
English-Chinese Example
February 18, 2008
CICLing-2008
79
English-Hindi Example
February 18, 2008
CICLing-2008
80
Rule Learning - Overview
• Goal: Acquire Syntactic Transfer Rules
• Use available knowledge from the source
side (grammatical structure)
• Three steps:
1. Flat Seed Generation: first guesses at
transfer rules; flat syntactic structure
2. Compositionality: use previously learned
rules to add hierarchical structure
3. Seeded Version Space Learning: refine
rules by learning appropriate feature
constraints
February 18, 2008
CICLing-2008
81
Flat Seed Rule Generation
Learning Example: NP
Eng:
the big apple
Heb: ha-tapuax ha-gadol
Generated Seed Rule:
NP::NP [ART ADJ N]  [ART N ART ADJ]
((X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2))
February 18, 2008
CICLing-2008
82
Compositionality
Initial Flat Rules:
S::S
[ART ADJ N V ART N]  [ART N ART ADJ V P ART N]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8))
NP::NP [ART ADJ N]  [ART N ART ADJ]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N]  [ART N]
((X1::Y1) (X2::Y2))
Generated Compositional Rule:
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
February 18, 2008
CICLing-2008
83
Seeded Version Space Learning
Input: Rules and their Example Sets
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
{ex1,ex12,ex17,ex26}
NP::NP [ART ADJ N]  [ART N ART ADJ] {ex2,ex3,ex13}
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N]  [ART N]
((X1::Y1) (X2::Y2))
{ex4,ex5,ex6,ex8,ex10,ex11}
Output: Rules with Feature Constraints:
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4)
(X1 NUM = X2 NUM)
(Y1 NUM = Y2 NUM)
(X1 NUM = Y1 NUM))
February 18, 2008
CICLing-2008
84
Statistical XFER:
Hybrid Statistical Rule-based
Machine Translation
Alon Lavie
Language Technologies Institute
Carnegie Mellon University
Joint work with:
Jaime Carbonell, Lori Levin, Bob Frederking, Erik Peterson,
Christian Monson, Vamshi Ambati, Greg Hanneman, Kathrin
Probst, Ariadna Font-Llitjos, Alison Alvarez, Roberto Aranovich
Outline
•
•
•
•
•
•
•
Background and Rationale
Stat-XFER Framework Overview
Elicitation
Learning Transfer Rules
Automatic Rule Refinement
Example Prototypes
Major Research Challenges
February 18, 2008
CICLing-2008
86
Progression of MT
• Started with rule-based systems
– Very large expert human effort to construct languagespecific resources (grammars, lexicons)
– High-quality MT extremely expensive  only for handful of
language pairs
• Along came EBMT and then Statistical MT…
– Replaced human effort with extremely large volumes of
parallel text data
– Less expensive, but still only feasible for a small number of
language pairs
– We “traded” human labor with data
• Where does this take us in 5-10 years?
– Large parallel corpora for maybe 25-50 language pairs
• What about all the other languages?
• Is all this data (with very shallow representation of
language structure) really necessary?
• Can we build MT approaches that learn deeper levels of
language structure and how they map from one
language to another?
February 18, 2008
CICLing-2008
87
Hebrew Input
‫בשורה הבאה‬
Transfer Rules
{NP1,3}
NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1]
((X3::Y1)
(X1::Y2)
((X1 def) = +)
((X1 status) =c absolute)
((X1 num) = (X3 num))
((X1 gen) = (X3 gen))
(X0 = X1))
Preprocessing
Morphology
Scoring
Features
Transfer
Engine
Translation Lexicon
N::N |: ["$WR"] -> ["BULL"]
((X1::Y1)
((X0 NUM) = s)
((Y0 lex) = "BULL"))
N::N |: ["$WRH"] -> ["LINE"]
((X1::Y1)
((X0 NUM) = s)
((Y0 lex) = "LINE"))
February 18, 2008
Decoder
Translation
Output Lattice
(0 1 "IN" @PREP)
(1 1 "THE" @DET)
(2 2 "LINE" @N)
(1 2 "THE LINE" @NP)
(0 2 "IN LINE" @PP)
CICLing-2008
(0 4 "IN
THE NEXT LINE" @PP)
English Output
in the next line
88
Transfer Rule Formalism
;SL: the old man, TL: ha-ish ha-zaqen
Type information
Part-of-speech/constituent
information
Alignments
NP::NP
(
(X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2)
x-side constraints
((X1 AGR) = *3-SING)
((X1 DEF = *DEF)
((X3 AGR) = *3-SING)
((X3 COUNT) = +)
y-side constraints
xy-constraints,
e.g. ((Y1 AGR) = (X1 AGR))
February 18, 2008
[DET ADJ N] -> [DET N DET ADJ]
((Y1 DEF) = *DEF)
((Y3 DEF) = *DEF)
((Y2 AGR) = *3-SING)
((Y2 GENDER) = (Y4 GENDER))
)
CICLing-2008
89
Transfer Rule Formalism (II)
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP
(
(X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2)
Value constraints
Agreement constraints
February 18, 2008
[DET ADJ N] -> [DET N DET ADJ]
((X1 AGR) = *3-SING)
((X1 DEF = *DEF)
((X3 AGR) = *3-SING)
((X3 COUNT) = +)
((Y1 DEF) = *DEF)
((Y3 DEF) = *DEF)
((Y2 AGR) = *3-SING)
((Y2 GENDER) = (Y4 GENDER))
)
CICLing-2008
90
Hebrew Manual Transfer Grammar
(human-developed)
• Initially developed in a couple of days, with
some later revisions by a CL post-doc
• Current grammar has 36 rules:
–
–
–
–
21 NP rules
one PP rule
6 verb complexes and VP rules
8 higher-phrase and sentence-level rules
• Captures the most common (mostly local)
structural differences between Hebrew and
English
February 18, 2008
CICLing-2008
91
Source-language Confusion Network
Hebrew Example
• Input word: B$WRH
0
1
2
3
4
|--------B$WRH--------|
|-----B-----|$WR|--H--|
|--B--|-H--|--$WRH---|
February 18, 2008
CICLing-2008
92
XFER Output Lattice
(28
(29
(29
(29
(30
(30
(30
(30
(30
(30
(30
28
29
29
29
30
30
30
30
30
30
30
"AND" -5.6988 "W" "(CONJ,0 'AND')")
"SINCE" -8.20817 "MAZ " "(ADVP,0 (ADV,5 'SINCE')) ")
"SINCE THEN" -12.0165 "MAZ " "(ADVP,0 (ADV,6 'SINCE THEN')) ")
"EVER SINCE" -12.5564 "MAZ " "(ADVP,0 (ADV,4 'EVER SINCE')) ")
"WORKED" -10.9913 "&BD " "(VERB,0 (V,11 'WORKED')) ")
"FUNCTIONED" -16.0023 "&BD " "(VERB,0 (V,10 'FUNCTIONED')) ")
"WORSHIPPED" -17.3393 "&BD " "(VERB,0 (V,12 'WORSHIPPED')) ")
"SERVED" -11.5161 "&BD " "(VERB,0 (V,14 'SERVED')) ")
"SLAVE" -13.9523 "&BD " "(NP0,0 (N,34 'SLAVE')) ")
"BONDSMAN" -18.0325 "&BD " "(NP0,0 (N,36 'BONDSMAN')) ")
"A SLAVE" -16.8671 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0
(N,34 'SLAVE')) ) ) ) ")
(30 30 "A BONDSMAN" -21.0649 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0
(NP0,0 (N,36 'BONDSMAN')) ) ) ) ")
February 18, 2008
CICLing-2008
93
The Lattice Decoder
• Simple Stack Decoder, similar in principle to simple
Statistical MT decoders
• Searches for best-scoring path of non-overlapping
lattice arcs
• No reordering during decoding
• Scoring based on log-linear combination of scoring
components, with weights trained using MERT
• Scoring components:
– Statistical Language Model
– Fragmentation: how many arcs to cover the entire
translation?
– Length Penalty
– Rule Scores
– Lexical Probabilities
February 18, 2008
CICLing-2008
94
XFER Lattice Decoder
00
ON THE FOURTH DAY THE LION ATE THE RABBIT TO A MORNING MEAL
Overall: -8.18323, Prob: -94.382, Rules: 0, Frag: 0.153846, Length: 0,
Words: 13,13
235 < 0 8 -19.7602: B H IWM RBI&I (PP,0 (PREP,3 'ON')(NP,2 (LITERAL 'THE')
(NP2,0 (NP1,1 (ADJ,2 (QUANT,0 'FOURTH'))(NP1,0 (NP0,1 (N,6 'DAY')))))))>
918 < 8 14 -46.2973: H ARIH AKL AT H $PN (S,2 (NP,2 (LITERAL 'THE') (NP2,0
(NP1,0 (NP0,1 (N,17 'LION')))))(VERB,0 (V,0 'ATE'))(NP,100
(NP,2 (LITERAL 'THE') (NP2,0 (NP1,0 (NP0,1 (N,24 'RABBIT')))))))>
584 < 14 17 -30.6607: L ARWXH BWQR (PP,0 (PREP,6 'TO')(NP,1 (LITERAL 'A')
(NP2,0 (NP1,0 (NNP,3 (NP0,0 (N,32 'MORNING'))(NP0,0 (N,27 'MEAL')))))))>
February 18, 2008
CICLing-2008
95
Data Elicitation for Languages with
Limited Resources
• Rationale:
– Large volumes of parallel text not available  create
a small maximally-diverse parallel corpus that
directly supports the learning task
– Bilingual native informant(s) can translate and align
a small pre-designed elicitation corpus, using
elicitation tool
– Elicitation corpus designed to be typologically and
structurally comprehensive and compositional
– Transfer-rule engine and new learning approach
support acquisition of generalized transfer-rules from
the data
February 18, 2008
CICLing-2008
96
Designing Elicitation Corpora
• Goal: Create a small representative parallel corpus that
contains examples of the most important translation
correspondences and divergences between the two languages
• Method:
– Elicit translations and word alignments for a broad diversity of
linguistic phenomena and constructions
• Current Elicitation Corpus: ~3100 sentences and phrases,
constructed based on a broad feature-based specification
• Open Research Issues:
– Feature Detection: discover what features exist in the language
and where/how they are marked
• Example: does the language mark gender of nouns? How and where
are these marked?
– Dynamic corpus navigation based on feature detection: no need
to elicit for combinations involving non-existent features
February 18, 2008
CICLing-2008
97
Rule Learning - Overview
• Goal: Acquire Syntactic Transfer Rules
• Use available knowledge from the source
side (grammatical structure)
• Three steps:
1. Flat Seed Generation: first guesses at
transfer rules; flat syntactic structure
2. Compositionality Learning: use previously
learned rules to learn hierarchical structure
3. Constraint Learning: refine rules by
learning appropriate feature constraints
February 18, 2008
CICLing-2008
98
Flat Seed Rule Generation
Learning Example: NP
Eng:
the big apple
Heb: ha-tapuax ha-gadol
Generated Seed Rule:
NP::NP [ART ADJ N]  [ART N ART ADJ]
((X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2))
February 18, 2008
CICLing-2008
99
Compositionality Learning
Initial Flat Rules:
S::S
[ART ADJ N V ART N]  [ART N ART ADJ V P ART N]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8))
NP::NP [ART ADJ N]  [ART N ART ADJ]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N]  [ART N]
((X1::Y1) (X2::Y2))
Generated Compositional Rule:
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
February 18, 2008
CICLing-2008
100
Constraint Learning
Input: Rules and their Example Sets
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
{ex1,ex12,ex17,ex26}
NP::NP [ART ADJ N]  [ART N ART ADJ] {ex2,ex3,ex13}
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N]  [ART N]
((X1::Y1) (X2::Y2))
{ex4,ex5,ex6,ex8,ex10,ex11}
Output: Rules with Feature Constraints:
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4)
(X1 NUM = X2 NUM)
(Y1 NUM = Y2 NUM)
(X1 NUM = Y1 NUM))
February 18, 2008
CICLing-2008
101
Automated Rule Refinement
• Bilingual informants can identify translation
errors and pinpoint the errors
• A sophisticated trace of the translation path
can identify likely sources for the error and do
“Blame Assignment”
• Rule Refinement operators can be developed
to modify the underlying translation grammar
(and lexicon) based on characteristics of the
error source:
– Add or delete feature constraints from a rule
– Bifurcate a rule into two rules (general and specific)
– Add or correct lexical entries
• See [Font-Llitjos, Carbonell & Lavie, 2005]
February 18, 2008
CICLing-2008
102
Stat-XFER MT Prototypes
• General Statistical XFER framework under development for
past five years (funded by NSF and DARPA)
• Prototype systems so far:
–
–
–
–
–
–
Chinese-to-English
Dutch-to-English
French-to-English
Hindi-to-English
Hebrew-to-English
Mapudungun-to-Spanish
–
–
–
–
–
–
Brazilian Portuguese-to-English
Native-Brazilian languages to Brazilian Portuguese
Hebrew-to-Arabic
Iñupiaq-to-English
Urdu-to-English
Turkish-to-English
• In progress or planned:
February 18, 2008
CICLing-2008
103
Chinese-English Stat-XFER System
• Bilingual lexicon: over 1.1 million entries (multiple
resources, incl. ADSO, Wikipedia, extracted base NPs)
• Manual syntactic XFER grammar: 76 rules! (mostly
NPs, a few PPs, and reordering of NPs/PPs within VPs)
• Multiple overlapping Chinese word segmentations
• English morphology generation
• Uses CMU SMT-group’s Suffix-Array LM toolkit for LM
• Current Performance (GALE dev-test):
– NW:
• XFER:
10.89(B)/0.4509(M)
• Best (UMD): 15.58(B)/0.4769(M)
– NG
• XFER:
8.92(B)/0.4229(M)
• Best (UMD): 12.96(B)/0.4455(M)
• In Progress:
– Automatic extraction of “clean” base NPs from parallel data
– Automatic learning and extraction of high-quality transferrules from parallel data
February 18, 2008
CICLing-2008
104
Translation Example
•
REFERENCE: When responding to whether it is possible
•
Stat-XFER (0.3989): In reply to whether the possibility to
extend the Russian fleet stationed in Crimea Pen. left the
deadline of the problem , Yanukovich replied : " of course .
IBM-ylee (0.2203):
In response to the possibility to extend the
deadline for the presence in Crimea peninsula , the Queen Vic said : "
of course .
CMU-SMT (0.2067): In response to a possible extension of the fleet in
the Crimean Peninsula stay on the issue , Yanukovych vetch replied :
" of course .
maryland-hiero (0.1878): In response to the possibility of extending
the mandate of the Crimean peninsula in , replied: "of course.
IBM-smt (0.1862):
The answer is likely to be extended the
Crimean peninsula of the presence of the problem, Yanukovych said: "
Of course.
CMU-syntax (0.1639): In response to the possibility of extension of
the presence in the Crimean Peninsula , replied : " of course .
•
•
•
•
•
to extend Russian fleet's stationing deadline at the
Crimean peninsula, Yanukovych replied, "Without a
doubt.
February 18, 2008
CICLing-2008
105
Major Research Directions
• Automatic Transfer Rule Learning:
– From manually word-aligned elicitation corpus
– From large volumes of automatically word-aligned
“wild” parallel data
– In the absence of morphology or POS annotated
lexica
– Compositionality and generalization
– Identifying “good” rules from “bad” rules
– Effective models for rule scoring for
• Decoding: using scores at runtime
• Pruning the large collections of learned rules
– Learning Unification Constraints
February 18, 2008
CICLing-2008
106
Major Research Directions
• Extraction of Base-NP translations from parallel data:
– Base-NPs are extremely important “building blocks” for
transfer-based MT systems
• Frequent, often align 1-to-1, improve coverage
• Correctly identifying them greatly helps automatic wordalignment of parallel sentences
– Parsers (or NP-chunkers) available for both languages:
Extract base-NPs independently on both sides and find
their correspondences
– Parsers (or NP-chunkers) available for only one language
(i.e. English): Extract base-NPs on one side, and find
reliable correspondences for them using word-alignment,
frequency distributions, other features…
• Promising preliminary results
February 18, 2008
CICLing-2008
107
Major Research Directions
• Algorithms for XFER and Decoding
– Integration and optimization of multiple
features into search-based XFER parser
– Complexity and efficiency improvements
(i.e. “Cube Pruning”)
– Non-monotonicity issues (LM scores,
unification constraints) and their
consequences on search
February 18, 2008
CICLing-2008
108
Major Research Directions
• Building Elicitation Corpora:
– Feature Detection
– Corpus Navigation
• Automatic Rule Refinement
• Translation for highly polysynthetic
languages such as Mapudungun and
Iñupiaq
February 18, 2008
CICLing-2008
109
Questions?
February 18, 2008
CICLing-2008
110
Recent Performance
Analysis
• What fraction of the time does each MT system produce
the best translation (sentence-by-sentence)?
• Evaluated on Chinese GALE dev-test (text) data
CMU-PhraseSyntaxCombination
(14.4%)
IBM-smt
(17.2%)
IBM-ylee
(17.6%)
maryland-jhu-combination
(27.1%)
Stat-XFER
284 (19.7%)
February 18, 2008
BLEU
60 of 284 (21.1%)
METEOR
41 of 284
50 of 284 (17.6%)
49 of 284
64 of 284 (22.5%)
50 of 284
71 of 284 (25.0%)
77 of 284
32 of 284 (11.2%)
CICLing-2008
56 of
111
Outline
•
•
•
•
•
•
•
•
Rationale for learning-based MT
Roadmap for learning-based MT
Framework overview
Elicitation
Learning transfer Rules
Automatic rule refinement
Example prototypes
Major Research Challenges
February 18, 2008
CICLing-2008
112
Outline
•
•
•
•
•
•
•
•
•
Rationale for learning-based MT
Roadmap for learning-based MT
Framework overview
Elicitation
Learning transfer Rules
Automatic rule refinement
Example prototypes
Implications for MT with vast parallel data
Conclusions and future directions
February 18, 2008
CICLing-2008
113
Outline
•
•
•
•
•
•
•
•
•
Rationale for learning-based MT
Roadmap for learning-based MT
Framework overview
Elicitation
Learning transfer Rules
Automatic rule refinement
Example prototypes
Implications for MT with vast parallel data
Conclusions and future directions
February 18, 2008
CICLing-2008
114
Stat-XFER Prototypes
• General XFER framework under development for past
five years
• Prototype systems so far:
–
–
–
–
–
German-to-English, Dutch-to-English
Chinese-to-English
Hindi-to-English
Hebrew-to-English
Portuguese-to-English
• In progress or planned:
–
–
–
–
Mapudungun-to-Spanish
Quechua-to-Spanish
Arabic-to-English
Native-Brazilian languages to Brazilian Portuguese
February 18, 2008
CICLing-2008
115
CMU’s Statistical-Transfer
(XFER) Approach
• Framework: Statistical search-based approach with
syntactic translation transfer rules that can be acquired
from data but also developed and extended by experts
• Elicitation: use bilingual native informants to produce a
small high-quality word-aligned bilingual corpus of
translated phrases and sentences
• Transfer-rule Learning: apply ML-based methods to
automatically acquire syntactic transfer rules for
translation between the two languages
• XFER + Decoder:
– XFER engine produces a lattice of possible transferred
structures at all levels
– Decoder searches and selects the best scoring combination
• Rule Refinement: refine the acquired rules via a process
of interaction with bilingual informants
• Word and Phrase bilingual lexicon acquisition
February 18, 2008
CICLing-2008
116
The Transfer Engine
• Main algorithm: chart-style bottom-up integrated
parsing+transfer with beam pruning
– Seeded by word-to-word translations
– Driven by transfer rules
– Generates a lattice of transferred translation segments at
all levels
• Some Unique Features:
– Works with either learned or manually-developed transfer
grammars
– Handles rules with or without unification constraints
– Supports interfacing with servers for morphological
analysis and generation
– Can handle ambiguous source-word analyses and/or SL
segmentations represented in the form of lattice structures
February 18, 2008
CICLing-2008
117
Why Machine Translation
for Languages with Limited Resources?
• We are in the age of information explosion
– The internet+web+Google  anyone can get the
information they want anytime…
• But what about the text in all those other
languages?
– How do they read all this English stuff?
– How do we read all the stuff that they put online?
• MT for these languages would Enable:
– Better government access to native indigenous and
minority communities
– Better minority and native community participation
in information-rich activities (health care, education,
government) without giving up their languages.
– Civilian and military applications (disaster relief)
– Language preservation
February 18, 2008
CICLing-2008
118
The Roadmap to Learning-based MT
• Automatic acquisition of necessary language resources
and knowledge using machine learning methodologies:
– Learning morphology (analysis/generation)
– Rapid acquisition of broad coverage word-to-word and
phrase-to-phrase translation lexicons
– Learning of syntactic structural mappings
• Tree-to-tree structure transformations [Knight et al], [Eisner],
[Melamed] require parse trees for both languages
• Learning syntactic transfer rules with resources (grammar,
parses) for just one of the two languages
– Automatic rule refinement and/or post-editing
• A framework for integrating the acquired MT resources
into effective MT prototype systems
• Effective integration of acquired knowledge with
statistical/distributional information
February 18, 2008
CICLing-2008
119
Why Machine Translation
for Languages with Limited Resources?
• We are in the age of information explosion
– The internet+web+Google  anyone can get the
information they want anytime…
• But what about the text in all those other
languages?
– How do they read all this English stuff?
– How do we read all the stuff that they put online?
• MT for these languages would Enable:
– Better government access to native indigenous and
minority communities
– Better minority and native community participation
in information-rich activities (health care, education,
government) without giving up their languages.
– Civilian and military applications (disaster relief)
– Language preservation
February 18, 2008
CICLing-2008
120
CMU’s AVENUE Approach
• Elicitation: use bilingual native informants to create a
small high-quality word-aligned bilingual corpus of
translated phrases and sentences
– Building Elicitation corpora from feature structures
– Feature Detection and Navigation
• Transfer-rule Learning: apply ML-based methods to
automatically acquire syntactic transfer rules for
translation between the two languages
– Learn from major language to minor language
– Translate from minor language to major language
• XFER + Decoder:
– XFER engine produces a lattice of possible transferred
structures at all levels
– Decoder searches and selects the best scoring combination
• Rule Refinement: refine the acquired rules via a process
of interaction with bilingual informants
• Morphology Learning
• Word and Phrase bilingual lexicon acquisition
February 18, 2008
CICLing-2008
121
AVENUE Architecture
Word-aligned
elicited data
English
Language
Model
Learning Module
Transfer Rules
{PP,4894}
;;Score:0.0470
PP::PP [NP POSTP] -> [PREP NP]
((X2::Y1)
(X1::Y2))
Run Time
Transfer
System
Lattice
Word-to-Word
Translation
Probabilities
Decoder
Translation Lexicon
February 18, 2008
CICLing-2008
122
The Transfer Engine
Analysis
Transfer
Source text is parsed
A target language tree is
into its grammatical
created by reordering,
structure. Determines insertion, and deletion.
transfer application
ordering.
S
Example:
NP VP
他 看 书。(he read
book)
N
he
S
NP VP
N
V NP
他
看书
February 18, 2008
V NP
read DET N
a
book
Article “a” is inserted
into object NP. Source
words translated with
transfer lexicon.
CICLing-2008
Generation
Target language
constraints are
checked and final
translation
produced.
E.g. “reads” is
chosen over “read”
to agree with “he”.
Final translation:
“He reads a book”
123
The Transfer Engine
• Some Unique Features:
– Works with either learned or manually-developed
transfer grammars
– Handles rules with or without unification constraints
– Supports interfacing with servers for morphological
analysis and generation
– Can handle ambiguous source-word analyses and/or
SL segmentations represented in the form of lattice
structures
February 18, 2008
CICLing-2008
124
The Lattice Decoder
• Simple Stack Decoder, similar in principle to
SMT/EBMT decoders
• Searches for best-scoring path of nonoverlapping lattice arcs
• Scoring based on log-linear combination of
scoring components (no MER training yet)
• Scoring components:
– Standard trigram LM
– Fragmentation: how many arcs to cover the entire
translation?
– Length Penalty
– Rule Scores (not fully integrated yet)
February 18, 2008
CICLing-2008
125
Typological Elicitation Corpus
• Feature Detection
– Discover what features exist in the language and
where/how they are marked
• Example: does the language mark gender of nouns?
How and where are these marked?
– Method: compare translations of minimal pairs –
sentences that differ in only ONE feature
• Elicit translations/alignments for detected
features and their combinations
• Dynamic corpus navigation based on feature
detection: no need to elicit for combinations
involving non-existent features
February 18, 2008
CICLing-2008
126
Typological Elicitation Corpus
• Initial typological corpus of about 1000
sentences was manually constructed
• New construction methodology for building an
elicitation corpus using:
– A feature specification: lists inventory of available
features and their values
– A definition of the set of desired feature structures
• Schemas define sets of desired combinations of
features and values
• Multiplier algorithm generates the comprehensive set
of feature structures
– A generation grammar and lexicon: NLG generator
generates NL sentences from the feature structures
February 18, 2008
CICLing-2008
127
Structural Elicitation Corpus
• Goal: create a compact diverse sample corpus of
syntactic phrase structures in English in order to elicit
how these map into the elicited language
• Methodology:
– Extracted all CFG “rules” from Brown section of Penn
TreeBank (122K sentences)
– Simplified POS tag set
– Constructed frequency histogram of extracted rules
– Pulled out simplest phrases for most frequent rules for
NPs, PPs, ADJPs, ADVPs, SBARs and Sentences
– Some manual inspection and refinement
• Resulting corpus of about 120 phrases/sentences
representing common structures
• See [Probst and Lavie, 2004]
February 18, 2008
CICLing-2008
128
Flat Seed Rule Generation
• Create a “flat” transfer rule specific to the
sentence pair, partially abstracted to POS
– Words that are aligned word-to-word and have the
same POS in both languages are generalized to their
POS
– Words that have complex alignments (or not the
same POS) remain lexicalized
• One seed rule for each translation example
• No feature constraints associated with seed
rules (but mark the example(s) from which it
was learned)
February 18, 2008
CICLing-2008
129
Compositionality Learning
• Detection: traverse the c-structure of the
English sentence, add compositional structure
for translatable chunks
• Generalization: adjust constituent sequences
and alignments
• Two implemented variants:
– Safe Compositionality: there exists a transfer rule
that correctly translates the sub-constituent
– Maximal Compositionality: Generalize the rule if
supported by the alignments, even in the absence of
an existing transfer rule for the sub-constituent
February 18, 2008
CICLing-2008
130
Constraint Learning
• Goal: add appropriate feature constraints to the
acquired rules
• Methodology:
– Preserve general structural transfer
– Learn specific feature constraints from example set
• Seed rules are grouped into clusters of similar transfer
structure (type, constituent sequences, alignments)
• Each cluster forms a version space: a partially ordered
hypothesis space with a specific and a general boundary
• The seed rules in a group form the specific boundary of
a version space
• The general boundary is the (implicit) transfer rule with
the same type, constituent sequences, and alignments,
but no feature constraints
February 18, 2008
CICLing-2008
131
Constraint Learning: Generalization
• The partial order of the version space:
Definition: A transfer rule tr1 is strictly more
general than another transfer rule tr2 if all fstructures that are satisfied by tr2 are also
satisfied by tr1.
• Generalize rules by merging them:
– Deletion of constraint
– Raising two value constraints to an agreement
constraint, e.g.
((x1 num) = *pl), ((x3 num) = *pl) 
((x1 num) = (x3 num))
February 18, 2008
CICLing-2008
132
Challenges for Hebrew MT
• Paucity in existing language resources for
Hebrew
– No publicly available broad coverage morphological
analyzer
– No publicly available bilingual lexicons or dictionaries
– No POS-tagged corpus or parse tree-bank corpus for
Hebrew
– No large Hebrew/English parallel corpus
• Scenario well suited for CMU transfer-based
MT framework for languages with limited
resources
February 18, 2008
CICLing-2008
133
Hebrew-to-English MT Prototype
• Initial prototype developed within a two month
intensive effort
• Accomplished:
–
–
–
–
–
Adapted available morphological analyzer
Constructed a preliminary translation lexicon
Translated and aligned Elicitation Corpus
Learned XFER rules
Developed (small) manual XFER grammar as a point
of comparison
– System debugging and development
– Evaluated performance on unseen test data using
automatic evaluation metrics
February 18, 2008
CICLing-2008
134
Morphology Example
Y0: ((SPANSTART 0)
(SPANEND 4)
(LEX B$WRH)
(POS N)
(GEN F)
(NUM S)
(STATUS ABSOLUTE))
Y1: ((SPANSTART 0)
(SPANEND 2)
(LEX B)
(POS PREP))
Y2: ((SPANSTART 1)
(SPANEND 3)
(LEX $WR)
(POS N)
(GEN M)
(NUM S)
(STATUS ABSOLUTE))
Y3: ((SPANSTART 3)
(SPANEND 4)
(LEX $LH)
(POS POSS))
Y4: ((SPANSTART 0)
(SPANEND 1)
(LEX B)
(POS PREP))
Y5: ((SPANSTART 1)
(SPANEND 2)
(LEX H)
(POS DET))
Y6: ((SPANSTART 2)
(SPANEND 4)
(LEX $WRH)
(POS N)
(GEN F)
(NUM S)
(STATUS ABSOLUTE))
Y7: ((SPANSTART 0)
(SPANEND 4)
(LEX B$WRH)
(POS LEX))
February 18, 2008
CICLing-2008
135
Sample Output (dev-data)
maxwell anurpung comes from ghana for israel four years
ago and since worked in cleaning in hotels in eilat
a few weeks ago announced if management club hotel that
for him to leave israel according to the government
instructions and immigration police
in a letter in broken english which spread among the
foreign workers thanks to them hotel for their hard work
and announced that will purchase for hm flight tickets
for their countries from their money
February 18, 2008
CICLing-2008
136
Evaluation Results
• Test set of 62 sentences from Haaretz
newspaper, 2 reference translations
System
BLEU
NIST
P
R
METEOR
No Gram
0.0616 3.4109 0.4090
0.4427
0.3298
Learned
0.0774 3.5451 0.4189
0.4488
0.3478
Manual
0.1026 3.7789 0.4334
0.4474
0.3617
February 18, 2008
CICLing-2008
137
Hebrew-English:
Test Suite Evaluation
Grammar
BLEU
METEOR
Baseline (NoGram)
0.0996
0.4916
Learned Grammar
0.1608
0.5525
Manual Grammar
0.1642
0.5320
February 18, 2008
CICLing-2008
138
Outline
•
•
•
•
•
•
•
•
•
•
Rationale for learning-based MT
Roadmap for learning-based MT
Framework overview
Elicitation
Learning transfer Rules
Automatic rule refinement
Learning Morphology
Example prototypes
Implications for MT with vast parallel data
Conclusions and future directions
February 18, 2008
CICLing-2008
139
Implications for MT with Vast
Amounts of Parallel Data
• Phrase-to-phrase MT ill suited for long-range
reorderings  ungrammatical output
• Recent work on hierarchical Stat-MT [Chiang, 2005] and
parsing-based MT [Melamed et al, 2005] [Knight et al]
• Learning general tree-to-tree syntactic mappings is
equally problematic:
– Meaning is a hybrid of complex, non-compositional phrases
embedded within a syntactic structure
– Some constituents can be translated in isolation, others
require contextual mappings
February 18, 2008
CICLing-2008
140
Implications for MT with Vast
Amounts of Parallel Data
• Our approach for learning transfer rules is
applicable to the large data scenario, subject
to solutions for several large challenges:
– No elicitation corpus  break-down parallel
sentences into reasonable learning examples
– Working with less reliable automatic word alignments
rather than manual alignments
– Effective use of reliable parse structures for ONE
language (i.e. English) and automatic word
alignments in order to decompose the translation of
a sentence into several compositional rules.
– Effective scoring of resulting very large transfer
grammars, and scaled up transfer + decoding
February 18, 2008
CICLing-2008
141
Implications for MT with Vast
Amounts of Parallel Data
• Example:
他 经常 与 江泽民 总统 通 电话
He freq
with J Zemin Pres
via phone
He freq talked with President J Zemin over the phone
February 18, 2008
CICLing-2008
142
Implications for MT with Vast
Amounts of Parallel Data
• Example:
他 经常 与 江泽民 总统 通 电话
NP1
He freq
NP2 Pres
with J Zemin
NP3
via phone
He freq talked with President
J Zemin over the
NP1
NP2
NP3phone
February 18, 2008
CICLing-2008
143
Conclusions
• There is hope yet for wide-spread MT between many of
the worlds language pairs
• MT offers a fertile yet extremely challenging ground for
learning-based approaches that leverage from diverse
sources of information:
–
–
–
–
Syntactic structure of one or both languages
Word-to-word correspondences
Decomposable units of translation
Statistical Language Models
• AVENUE’s XFER approach provides a feasible solution to
MT for languages with limited resources
• Promising approach for addressing the fundamental
weaknesses in current corpus-based MT for languages
with vast resources
February 18, 2008
CICLing-2008
144
February 18, 2008
CICLing-2008
145
Mapudungun-to-Spanish
Example
English
I didn’t see Maria
Mapudungun
pelafiñ Maria
Spanish
No vi a María
February 18, 2008
CICLing-2008
146
Mapudungun-to-Spanish
Example
English
I didn’t see Maria
Mapudungun
pelafiñ Maria
pe -la -fi
-ñ
Maria
see -neg -3.obj -1.subj.indicative Maria
Spanish
No vi a María
No vi
a María
neg see.1.subj.past.indicative acc Maria
February 18, 2008
CICLing-2008
147
pe-la-fi-ñ Maria
V
pe
February 18, 2008
CICLing-2008
148
pe-la-fi-ñ Maria
V
pe
VSuff
Negation = +
la
February 18, 2008
CICLing-2008
149
pe-la-fi-ñ Maria
V
pe
VSuffG
Pass all features up
VSuff
la
February 18, 2008
CICLing-2008
150
pe-la-fi-ñ Maria
V
pe
VSuffG VSuff
object person = 3
VSuff
fi
la
February 18, 2008
CICLing-2008
151
pe-la-fi-ñ Maria
V
pe
VSuffG
VSuffG VSuff
VSuff
Pass all features up
from both children
fi
la
February 18, 2008
CICLing-2008
152
pe-la-fi-ñ Maria
V
pe
VSuffG
VSuff
VSuffG VSuff
ñ
VSuff
person = 1
number = sg
mood = ind
fi
la
February 18, 2008
CICLing-2008
153
pe-la-fi-ñ Maria
VSuffG
V
pe
VSuffG
VSuff
VSuffG VSuff
ñ
VSuff
Pass all features up
from both children
fi
la
February 18, 2008
CICLing-2008
154
pe-la-fi-ñ Maria
Pass all features up
from both children
V
V
pe
Check that:
1) negation = +
VSuffG
2) tense is undefined
VSuffG
VSuff
VSuffG VSuff
VSuff
ñ
fi
la
February 18, 2008
CICLing-2008
155
pe-la-fi-ñ Maria
NP
V
pe
N
VSuffG
V
person = 3
number = sg
human = +
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
fi
la
February 18, 2008
CICLing-2008
156
pe-la-fi-ñ Maria
S
Pass features up
from V
Check that NP is
human = +
VP
NP
V
pe
N
VSuffG
V
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
fi
la
February 18, 2008
CICLing-2008
157
Transfer to Spanish: Top-Down
S
S
VP
VP
NP
V
pe
N
VSuffG
V
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
fi
la
February 18, 2008
CICLing-2008
158
Transfer to Spanish: Top-Down
Pass all features
to Spanish side
S
S
VP
VP
NP
V
pe
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
“a”
NP
N
VSuffG
V
V
fi
la
February 18, 2008
CICLing-2008
159
Transfer to Spanish: Top-Down
S
VP
NP
V
pe
V
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
VP
“a”
NP
N
VSuffG
V
S
Pass all
features
down
fi
la
February 18, 2008
CICLing-2008
160
Transfer to Spanish: Top-Down
S
S
VP
VP
NP
V
pe
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
“a”
NP
N
VSuffG
V
V
Pass
object
features
down
fi
la
February 18, 2008
CICLing-2008
161
Transfer to Spanish: Top-Down
S
S
VP
VP
NP
V
pe
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
“a”
NP
N
VSuffG
V
V
fi
Accusative
marker on
objects is
introduced
because
human = +
la
February 18, 2008
CICLing-2008
162
Transfer to Spanish: Top-Down
S
S
VP VP::VP [VBar NP] -> [VBar "a" NP] VP
( (X1::Y1)
NP
V
V
“a”
NP
(X2::Y3)
pe
N = (*NOT* personal))
((X2 type)
((X2 human) =c +)
VSuff
(X0 = N
X1)
((X0 object) = X2)
VSuffG
V
VSuffG
VSuffG VSuff
VSuff
fi
la
February 18, 2008
ñ (Y0Maria
= X0)
((Y0 object) = (X0 object))
(Y1 = Y0)
(Y3 = (Y0 object))
((Y1 objmarker person) = (Y3 person))
((Y1 objmarker number) = (Y3 number))
((Y1 objmarker gender) = (Y3 ender)))
CICLing-2008
163
Transfer to Spanish: Top-Down
S
S
Pass person, number,
andVP
mood features to
Spanish Verb
“a”
NP
V
VP
NP
V
pe
“no” V
N
VSuffG
V
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
Assign tense = past
fi
la
February 18, 2008
CICLing-2008
164
Transfer to Spanish: Top-Down
S
S
VP
VP
NP
V
pe
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
NP
“no” V
N
VSuffG
V
“a”
V
Introduced because
negation = +
fi
la
February 18, 2008
CICLing-2008
165
Transfer to Spanish: Top-Down
S
S
VP
VP
NP
V
pe
“no”
N
VSuffG
V
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
“a”
V
NP
V
ver
fi
la
February 18, 2008
CICLing-2008
166
Transfer to Spanish: Top-Down
S
S
VP
VP
NP
V
pe
“no”
N
VSuffG
V
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
NP
V
ver
vi
fi
la
February 18, 2008
“a”
V
CICLing-2008
person = 1
number = sg
mood = indicative
tense = past
167
Transfer to Spanish: Top-Down
S
S
Pass features over to
VP Spanish side
VP
NP
V
pe
“no”
N
VSuffG
V
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
“a”
V
NP
V
N
vi
N
María
fi
la
February 18, 2008
CICLing-2008
168
I Didn’t see Maria
S
S
VP
VP
NP
V
pe
“no”
N
VSuffG
V
VSuffG
VSuff
N
VSuffG VSuff
ñ
Maria
VSuff
“a”
V
NP
V
N
vi
N
María
fi
la
February 18, 2008
CICLing-2008
169
February 18, 2008
CICLing-2008
170