Tree Automata for Automatic Language Translation kevin knight information sciences institute university of southern california Outline • History of the World (of Automata in NLP) • Weighted string automata in NLP – Applications • • • • transliteration machine translation language modeling speech, lexical processing, tagging, summarization, optical character recognition, … – Generic algorithms and toolkits • Weighted tree automata in NLP – Applications – Generic algorithms and toolkits • Some connections with theory History of the World consonant/vowel sequences in Pushkin novels [Markov 1913] noisy channel model cryptography [Shannon 1948] context free grammars [Chomsky 1956] transformational grammars [Chomsky 1957] [Rounds 1970] & [Thatcher 1970] tree transducers, to formalize transformational grammars Transformational Grammar * the boy saw the door S the door was seen by the boy S * NP NP VP DT N V the boy saw NP DT N the door VP DT N the door AUX was V seen PP P by NP DT N the boy History of the World consonant/vowel sequences in Pushkin novels [Markov 1913] noisy channel model cryptography [Shannon 1948] context free grammars [Chomsky 1956] transformational grammars [Chomsky 1957] [Rounds 1970] & [Thatcher 1970] tree transducers, to formalize transformational grammars History of the World consonant/vowel sequences in Pushkin novels [Markov 1913] noisy channel model cryptography [Shannon 1948] context free grammars [Chomsky 1956] transformational grammars [Chomsky 1957] [Rounds 1970] & [Thatcher 1970] tree transducers, to formalize transformational automata [Thatcher 1973] tree grammars survey article History of the World “The number one priority in the area [of tree automata theory] is a careful assessment of the significant problems concerning consonant/vowel natural language and programming language semantics and sequences in Pushkin novels translation. If such problems can be found and formulated, I am convinced that the approach informally surveyed here can [Markov 1913] provide a unifying framework within whichnoisy to study them.” channel model cryptography [Shannon 1948] context free grammars [Chomsky 1956] transformational grammars [Chomsky 1957] [Rounds 1970] & [Thatcher 1970] tree transducers, to formalize transformational automata [Thatcher 1973] tree grammars survey article History of the World Tree Automata Theory Linguistics Computers History of the World THEORY Let’s prove theorems! LINGUISTICS Let’s drop formalism until we understand things better! NATURAL LANGUAGE PROCESSING Let’s build demo systems! Natural Language Processing • 1970-80s – models of English syntax, demonstration grammars – beyond CFG • augmented transition networks (ATN) • unification-based grammars (HPSG, LFG, ...) – mostly turned out to be formally equivalent to each other … and to Turing machines • tree-adjoining grammar (TAG), categorial grammar – mildly context-sensitive grammars • Meanwhile, in speech recognition… – – – – probabilistic finite-state grammars of English built automatically from training data (corpus) word n-grams successful paradigm Natural Language Processing • 1993 – US agency DARPA presided over forced marriage of speech and language research • 1990s – NLP dominated by probabilistic finite-state string formalisms and automatic training – Weighted FSA/FST toolkits • 2000s – Re-awakened interest in tree formalisms for modeling syntax-sensitive operations Back to the Outline • History of the World (of Automata for NLP) • Weighted string automata in NLP – Applications • • • • transliteration machine translation language modeling speech, lexical processing, tagging, summarization, optical character recognition, … – Generic algorithms and toolkits • Weighted tree automata in NLP – Applications – Generic algorithms and toolkits • Some connections with theory Natural Language Transformations • • • • • • • • Machine Translation Name Transliteration Compression Question Answering Spelling Correction Speech Recognition Language Generation Text to Speech Input Output Finite-State Transducer (FST) Original input: k q k q2 *e* q i q AY q g q3 *e* q2 n q N q3 h q4 *e* q4 t qfinal T i g i : AY k : *e* q g : *e* t q FST n h Transformation: q3 n i g q2 n:N qfinal h : *e* q4 k h t:T t Finite-State (String) Transducer Original input: k FST q k q2 *e* q i q AY q g q3 *e* q2 n q N q3 h q4 *e* q4 t qfinal T n i g h i : AY k : *e* q g : *e* t Transformation: g qfinal h : *e* q4 n i q2 n:N q3 q2 h t:T t Finite-State (String) Transducer Original input: k FST q k q2 *e* q i q AY q g q3 *e* q2 n q N q3 h q4 *e* q4 t qfinal T n i g h i : AY k : *e* q g : *e* t Transformation: q qfinal h : *e* q4 i g q2 n:N q3 N h t:T t Finite-State (String) Transducer Original input: k FST q k q2 *e* q i q AY q g q3 *e* q2 n q N q3 h q4 *e* q4 t qfinal T n i g h i : AY k : *e* q g : *e* t Transformation: AY q q2 n:N q3 N qfinal h : *e* q4 g h t:T t Finite-State (String) Transducer Original input: k FST q k q2 *e* q i q AY q g q3 *e* q2 n q N q3 h q4 *e* q4 t qfinal T n i g h i : AY k : *e* q g : *e* t Transformation: AY q2 n:N q3 N qfinal h : *e* q4 q3 h t:T t Finite-State (String) Transducer Original input: k FST q k q2 *e* q i q AY q g q3 *e* q2 n q N q3 h q4 *e* q4 t qfinal T n i g h i : AY k : *e* q g : *e* t Transformation: AY q2 n:N q3 N qfinal h : *e* q4 t:T q4 t Finite-State (String) Transducer Original input: k FST q k q2 *e* q i q AY q g q3 *e* q2 n q N q3 h q4 *e* q4 t qfinal T n i g h i : AY k : *e* q g : *e* t Transformation: AY q2 n:N q3 N qfinal h : *e* q4 t:T T qfinal Transliteration Angela Knight transliteration a n ji ra na i to Frequently occurring translation problem across languages with different sound systems and character sets. (Japanese, Chinese, Arabic, Russian, English…) Can’t be solved by dictionary lookup. Forward and Backward Transliteration Angela Knight a n ji ra na i to Angela Knight a n ji ra na i forward transliteration (some variation allowed) to backward transliteration (no variation allowed) Practical Problem Transliteration WFST 7 input symbols Angela Knight 13 output symbols Transliteration WFST ra 7 input symbols Angela Knight 13 output symbols Transliteration generate/accept well-formed English sequences WFSA P(e) make transformations w/o worrying too much about context Angela Knight WFST P(k | e) noisy channel framework Transliteration generate/accept well-formed English sequences WFSA P(e) Angela Knight make transformations w/o worrying too much about context Angela Knight WFST P(k | e) DECODE argmax P(e | k) = e argmax P(e) P(k | e) e noisy channel framework Transliteration “generative story” WFSA A Angela Knight WFST B AE N J EH L UH N AY T WFST C a n j i r a n a i t o WFST D DECODE + millions more WFSA A + millions more WFST B AE N J IH R UH N AY T AH N J IH L UH N AY T OH + millions more WFST C a n j i r a n a i t o WFST D Machine Translation 美国关岛国际机场及其办公室均接获一 名自称沙地阿拉伯富商拉登等发出的电 子邮件,威胁将会向机场等公众地方发 动生化袭击後,关岛经保持高度戒备。 The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport. Machine Translation direct model noisy channel model “I see a Spanish sentence on the page. How did it get there?” Machine Translation [Brown et al 93] [Knight & Al-Onaizan 98] Machine Translation WFSA A [Brown et al 93] [Knight & Al-Onaizan 98] Machine Translation WFSA B [Brown et al 93] [Knight & Al-Onaizan 98] Machine Translation WFSA C [Brown et al 93] [Knight & Al-Onaizan 98] Machine Translation WFSA D [Brown et al 93] [Knight & Al-Onaizan 98] Machine Translation WFSA E [Brown et al 93] [Knight & Al-Onaizan 98] Other Applications of Weighted String Automata in NLP • speech recognition [Pereira, Riley, Sproat 94] • lexical processing – word segmentation [Sproat et al 96] – morphological analysis/generation [Kaplan and Kay 94; Clark 02] • tagging – part of speech tagging [Church 88] – name finding • summarization [Zajic, Dorr, Schwartz 02] • optical character recognition [Kolak, Byrne, Resnik 03] • decipherment [Knight et al 06] Algorithms for String Automata N-best … EM training Determinization … … paths through an WFSA (Viterbi, 1967; Eppstein, 1998) Forward-backward EM (Baum & Welch, 1971; Eisner 2001) Intersection … of weighted string acceptors (Mohri, 1997) WFSA intersection Application string WFST WFSA Transducer composition WFST composition (Pereira & Riley, 1996) String Automata Toolkits for Used in NLP • Unweighted – Xerox finite-state calculus • plus many children • Weighted – AT&T FSM – plus many children • Google OpenFST, ISI Carmel, Aachen FSA, DFKI FSM toolkit, MIT FST toolkit … String Automata Toolkits for Used in NLP % echo 'a n ji ra ho re su te ru na i to' | carmel -rsi -k 5 -IEQ word.names.50000wds.transducer word-epron.names.55000wds.transducer epron-jpron.1.transducer jpron.transducer vowel-separator.transducer jpron-asciikana.transducer ANGELA ANGELA ANGELA ANGELA ANGELA FORRESTAL KNIGHT FORRESTER KNIGHT FOREST EL KNIGHT FORESTER KNIGHT HOLLISTER KNIGHT 2.60e-20 6.00e-21 1.91e-21 1.77e-21 1.33e-21 /* /* /* /* /* /* wfsa wfst wfst wfst wfst wfst */ */ */ */ */ */ The Beautiful World of Composable Transducers English phoneme sequence P(r|e) P(p|e) P(e) English word sequence P(l|e) Long English word sequence Foreign phoneme sequence P(r) P(f|r) P(f|e) Foreign word sequence P(e|f) English word sequence P(p|e) English phoneme sequence P(e|p) English word sequence The Beautiful World of Composable Transducers English phoneme sequence P(r|e) P(p|e) P(e) English word sequence P(l|e) Long English word sequence Foreign phoneme sequence P(r) P(f|r) P(f|e) Foreign word sequence P(e|f) English word sequence P(p|e) English phoneme sequence P(e|p) English word sequence The Beautiful World of Composable Transducers English phoneme sequence P(r|e) P(p|e) P(e) English word sequence P(l|e) Long English word sequence Foreign phoneme sequence P(r) P(f|r) P(f|e) Foreign word sequence P(e|f) English word sequence P(p|e) English phoneme sequence P(e|p) English word sequence The Beautiful World of Composable Transducers English phoneme sequence P(r|e) P(p|e) P(e) English word sequence P(l|e) Long English word sequence Foreign phoneme sequence P(r) P(f|r) P(f|e) Foreign word sequence P(e|f) English word sequence P(p|e) English phoneme sequence P(e|p) English word sequence The Beautiful World of Composable Transducers English phoneme sequence P(r|e) P(p|e) P(e) English word sequence P(l|e) Long English word sequence Foreign phoneme sequence P(r) P(f|r) P(f|e) Foreign word sequence P(e|f) English word sequence P(p|e) English phoneme sequence P(e|p) English word sequence The Beautiful World of Composable Transducers English phoneme sequence P(r|e) P(p|e) P(e) English word sequence P(l|e) Long English word sequence Foreign phoneme sequence P(r) P(f|r) P(f|e) Foreign word sequence P(e|f) English word sequence P(p|e) English phoneme sequence P(e|p) English word sequence Finite-State String Transducers Nice properties Nice toolkits Translation Accuracy phrase substitution/transposition 2002 2003 2004 2005 2006 NIST Common Evaluations Finite-State String Transducers • Not expressive enough for many problems! • For example, machine translation: – Arabic to English: Move the verb from the beginning of the sentence to the middle (in between the subject and object) – Chinese to English: When translating nounphrase “de” noun-phrase, flip the order of the noun-phrases & substitute “of” for “de” Experimental Progress in Statistical Machine Translation Translation Accuracy phrase substitution, no linguistic categories tree transformation, linguistic categories 2002 2003 2004 2005 2006 NIST Common Evaluations Syntax Started to Be Helpful in 2006 Translation Accuracy 45 Chinese/English 40 35 String-based sentences < 16 words (NIST-03/04) 30 String-based all sentences (NIST-2003) apr may jun jul 2005 aug sept oct nov dec jan feb mar apr may jun july jan feb 2006 2007 String-Based Output 枪手 被 警方 击毙 . Gunman of police killed . Decoder Hypothesis #1 String-Based Output 枪手 被 警方 击毙 . Gunman of police attack . Decoder Hypothesis #7 String-Based Output 枪手 被 警方 击毙 . Gunman by police killed . Decoder Hypothesis #12 String-Based Output 枪手 被 警方 击毙 . Killed gunman by police . Decoder Hypothesis #134 String-Based Output 枪手 被 警方 击毙 . Gunman killed the police . Decoder Hypothesis #9,329 String-Based Output 枪手 被 警方 击毙 . Gunman killed by police . highest scoring output, phrasebased model Decoder Hypothesis #50,654 Problematic: • VBD “killed” needs a direct object • VBN “killed” needs an auxiliary verb (“was”) • countable “gunman” needs an article (“a”, “the”) • “passive marker” in Chinese controls re-ordering Can’t enforce/encourage any of this! Tree-Based Output 枪手 被 警方 击毙 . The gunman killed by police . DT NN VBD IN NN NPB PP NP-C VP S Decoder Hypothesis #1 Tree-Based Output 枪手 被 警方 击毙 . Gunman by police shot . NN IN NN VBD NPB PP NP-C VP S Decoder Hypothesis #16 Tree-Based Output 枪手 被 警方 击毙 . Decoder The gunman was killed by police . Hypothesis #1923 DT NN AUX VBN IN NN NPB PP highest scoring output, syntaxNP-C VP based model S OK, so how does a Chinese string transform into an English tree, or vice-versa? Back to the Outline • History of the World (of Automata for NLP) • Weighted string automata in NLP – Applications • • • • transliteration machine translation language modeling speech, lexical processing, tagging, summarization, optical character recognition, … – Generic algorithms and toolkits • Weighted tree automata in NLP – Applications – Generic algorithms and toolkits • Some connections with theory Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S NP S VP NP VP PRO VBZ NP PRO VBZ NP he enjoys SBAR he enjoys SBAR VBG VP VBG VP listening P NP listening P NP to music to music Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S NP S VP NP VP PRO VBZ NP PRO VBZ NP he enjoys SBAR he enjoys SBAR VBG VP VBG VP listening P NP listening P NP to music to music Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S NP PRO he VP VBZ enjoys NP , SBAR VBG NP NP PRO VP he listening P NP to music wa , VBZ SBAR VBG , ga , VP listening P NP to music enjoys Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S NP PRO VP VBZ NP NP kare he enjoys SBAR VBG VP , wa , SBAR VBG listening P NP to music VBZ , ga , VP listening P NP to music enjoys Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) Original input: Final output: S NP PRO he VP VBZ enjoys NP kare ,wa , ongaku,o ,kiku , no, ga, daisuki, desu SBAR VBG VP listening P NP to music Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S NP qS VP NP VP PRO VBZ NP PRO VBZ NP he enjoys SBAR he enjoys SBAR VBG VP VBG VP listening P NP listening P NP to music to music Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) Original input: qS Transformation: 0.2 s x0, wa, r x2, ga, q x1 S qS x0:NP VP NP VP x1:VBZ x2:NP NP VP PRO VBZ NP PRO VBZ NP he enjoys SBAR he enjoys SBAR VBG VP VBG VP listening P NP listening P NP to music to music Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S NP PRO he VP VBZ enjoys NP , SBAR VBG r NP s NP PRO VP he listening P NP to music wa , q VBZ SBAR VBG , ga , VP listening P NP to music enjoys Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: s NP 0.7 kare S PRO NP PRO he VP VBZ enjoys he NP , SBAR VBG r NP s NP PRO VP he listening P NP to music wa , q VBZ SBAR VBG , ga , VP listening P NP to music enjoys Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) Original input: Transformation: S NP PRO VP VBZ NP r NP kare he enjoys SBAR VBG VP , wa , SBAR VBG listening P NP to music q VBZ , ga , VP listening P NP to music enjoys Top-Down Tree Transducer (W. Rounds 1970; J. Thatcher 1970) Original input: Final output: S NP PRO he VP VBZ enjoys NP kare ,wa , ongaku,o ,kiku , no, ga, daisuki, desu SBAR VBG VP listening P NP to music To get total probability, multiply probabilities of the individual steps. Top-Down Tree Transducer • Introduced by Rounds (1970) & Thatcher (1970) “Recent developments in the theory of automata have pointed to an extension of the domain of definition of automata from strings to trees … parts of mathematical linguistics can be formalized easily in a tree-automaton setting … Our results should clarify the nature of syntax-directed translations and transformational grammars …” (Rounds 1970, “Mappings on Grammars and Trees”, Math. Systems Theory 4(3)) • Large theory literature – e.g., Gécseg & Steinby (1984), Comon et al (1997) • Once again re-connecting with NLP practice – e.g., Knight & Graehl (2005), Galley et al (2004, 2006) Tree Transducers Can be Extracted from Bilingual Data (Galley, Hopkins, Knight, Marcu, 2004) RULES ACQUIRED: S NP-C VP VP-C VBD SG-C VP VBN TO NPB PRP VP-C VB NP-C NPB PRP$ NN VBD(felt) VBN(obliged) VB(do) NN(part) NN(part) 有 责任 尽 一份 一份 力 VP-C(x0:VBN x1:SG-C) x0 x1 VP(TO(to) x0:VP-C) x0 … S(x0:NP-C x1:VP) x0 x1 i felt obliged to do my part 我 有 责任 尽 一份 力 Tree-to-String Transducer, used (noisy-channel-wise) to do string to tree translation. Tree Transducers Can be Extracted from Bilingual Data (Galley, Hopkins, Knight, Marcu, 2004) RULES ACQUIRED: S NP-C VP VP-C VBD SG-C VP VBN TO NPB PRP VP-C VB NP-C NPB PRP$ NN VBD(felt) VBN(obliged) VB(do) NN(part) NN(part) 有 责任 尽 一份 一份 力 VP-C(x0:VBN x1:SG-C) x0 x1 VP(TO(to) x0:VP-C) x0 … S(x0:NP-C x1:VP) x0 x1 i felt obliged to do my part 我 有 责任 尽 一份 力 Tree-to-String Transducer, used (noisy-channel-wise) to do string to tree translation. Tree Transducers Can be Extracted from Bilingual Data (Galley, Hopkins, Knight, Marcu, 2004) RULES ACQUIRED: S NP-C VP VP-C VBD SG-C VP VBN TO NPB PRP VP-C VB NP-C NPB PRP$ NN VBD(felt) VBN(obliged) VB(do) NN(part) NN(part) 有 责任 尽 一份 一份 力 VP-C(x0:VBN x1:SG-C) x0 x1 VP(TO(to) x0:VP-C) x0 … S(x0:NP-C x1:VP) x0 x1 i felt obliged to do my part 我 有 责任 尽 一份 力 Tree-to-String Transducer, used (noisy-channel-wise) to do string to tree translation. Tree Transducers Can be Extracted from Bilingual Data (Galley, Hopkins, Knight, Marcu, 2004) RULES ACQUIRED: S NP-C VP VP-C VBD SG-C VP VBN TO NPB PRP VP-C VB NP-C NPB PRP$ NN VBD(felt) VBN(obliged) VB(do) NN(part) NN(part) 有 责任 尽 一份 一份 力 VP-C(x0:VBN x1:SG-C) x0 x1 VP(TO(to) x0:VP-C) x0 … S(x0:NP-C x1:VP) x0 x1 i felt obliged to do my part 我 有 责任 尽 一份 力 Additional extraction methods: (Galley et al, 2006) (Marcu et al, 2006) Current systems learn ~500m rules. Sample “said that” rules VP VBD said SBAR-C IN ? x0:S-C that 0.57 0.09 0.02 0.02 0.02 0.01 0.01 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 说 , x0 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 说 x0 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 他 说 , x0 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 指出 , x0 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> x0 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 表示 x0 VP(VBD("said") SBAR-C(IN("that") x0:S-C)) -> 说 , x0 的 Sample Subject-Verb-Object Rules S x0:NP-C VP x1:VBD x3:. ? x2:NP-C CHINESE / ENGLISH 0.82 0.02 0.01 S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0 x1 x2 x3 S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0 x1 , x2 x3 S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0 , x1 x2 x3 ARABIC / ENGLISH 0.54 0.44 S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x0 x1 x2 x3 S(x0:NP-C VP(x1:VBD x2:NP-C) x3:.) -> x1 x0 x2 x3 Decoding • argmax P(etree | cstring) etree • Difficult search problem – Bottom-up CKY parser – Builds English constituents on top of Chinese spans – Record of rule applications (the derivation) provides information to construct English tree – Returns k-best trees Syntax-Based Decoding Rules apply when their right-hand sides (RHS) match some portion of the input. 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 . Syntax-Based Decoding Rules apply when their right-hand sides (RHS) match some portion of the input. “these” RULE 1: DT(these) 这 “include” RULE 2: VBP(include) 中包括 “France” RULE 4: NNP(France) 法国 “and” RULE 5: CC(and) 和 “Russia” RULE 6: NNP(Russia) 俄罗斯 “astronauts” RULE 8: NP(NNS(astronauts)) 宇航 , 员 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 “.” RULE 9: PUNC(.) . . Syntax-Based Decoding “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2 “these” RULE 1: DT(these) 这 “include” RULE 2: VBP(include) 中包括 “France” RULE 4: NNP(France) 法国 “and” RULE 5: CC(and) 和 “Russia” RULE 6: NNP(Russia) 俄罗斯 “astronauts” RULE 8: NP(NNS(astronauts)) 宇航 , 员 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 “.” RULE 9: PUNC(.) . . Syntax-Based Decoding “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0 “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2 “these” RULE 1: DT(these) 这 “include” RULE 2: VBP(include) 中包括 “France” RULE 4: NNP(France) 法国 “&” RULE 5: CC(and) 和 “Russia” RULE 6: NNP(Russia) 俄罗斯 “astronauts” RULE 8: NP(NNS(astronauts)) 宇航 , 员 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 “.” RULE 9: PUNC(.) . . Syntax-Based Decoding “astronauts coming from France and Russia” RULE 16: NP(x0:NP, x1:VP) x1 , 的 , x0 “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0 “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2 “these” RULE 1: DT(these) 这 “include” RULE 2: VBP(include) 中包括 “France” RULE 4: NNP(France) 法国 “&” RULE 5: CC(and) 和 “Russia” RULE 6: NNP(Russia) 俄罗斯 “astronauts” RULE 8: NP(NNS(astronauts)) 宇航 , 员 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 “.” RULE 9: PUNC(.) . . “include astronauts coming from France and Russia” RULE 14: VP(x0:VBP, x1:NP) x0 , x1 “astronauts coming from France and Russia” RULE 16: NP(x0:NP, x1:VP) x1 , 的 , x0 “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0 “France and Russia” RULE 13: NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2 “these” RULE 1: DT(these) 这 “include” RULE 2: VBP(include) 中包括 “France” RULE 4: NNP(France) 法国 “&” RULE 5: CC(and) 和 “Russia” RULE 6: NNP(Russia) 俄罗斯 “astronauts” RULE 8: NP(NNS(astronauts)) 宇航 , 员 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 “.” RULE 9: PUNC(.) . . RULE 15: S(x0:NP, x1:VP, x2:PUNC) x0 , x1 , x2 “These 7 people include astronauts coming from France and Russia” RULE 14: VP(x0:VBP, x1:NP) x0 , x1 Derivation Tree “include astronauts coming from France and Russia” “astronauts coming from France and Russia” RULE 16: NP(x0:NP, x1:VP) x1 , 的 , x0 “coming from France and Russia” RULE 11: VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0 “France and Russia” “these 7 people” RULE 13: NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2 RULE 10: NP(x0:DT, CD(7), NNS(people) x0 , 7人 “these” RULE 1: DT(these) 这 “include” RULE 2: VBP(include) 中包括 “France” RULE 4: NNP(France) 法国 “&” RULE 5: CC(and) 和 “Russia” RULE 6: NNP(Russia) 俄罗斯 “astronauts” RULE 8: NP(NNS(astronauts)) 宇航 , 员 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 “.” RULE 9: PUNC(.) . . Derived English Tree S VP NP VP PP NP NP NP NP DT CD NNS VBP NNS VBG IN These 7 people include astronauts coming from NNP NP CC NNP PUNC France and Russia . Chinese/English Translation Examples Chinese gloss: six unit Iraq civilian today in Iraq south part possessive protest in , police and UK troops shot killed . Machine Translation: Police and British troops shot and killed six Iraqi civilians in protests in southern Iraq today. Chinese/English Translation Examples Chinese: 印度 目前 共 有 74 种 控价 药 , 增加 后 的 控价 药品 将 占 印度 所售 药品 的 40% 以上 。 Machine Translation: Currently, a total of 74 types of medicine prices increased after the price of medicines will account for more than 40 per cent of medicines sold by India. Third, even if the lower VP weren’t bogus, “confirms” only takes a certain type of VP, namely a gerund (“confirms discussing the idea”). Arabic-English translation Second, even if the S-C really were a sentence, the verb “discussed” doesn’t take an S argument. So this is a bogus VP. First, this is not a sentence. The VP below is not finite (e.g., “visited Iran”). Tree Automata Operations for Machine Translation? argmax P(etree | cstring) etree Weighted tree grammar that accepts/scores English trees e = yield(best-tree(intersect(lm.rtg, b-apply(cstring, tm.tt))) Weighted tree-to-string transducer that turns English trees into Chinese strings Tree Automata Algorithms String Automata Algorithms Tree Automata Algorithms N-best … … paths through an WFSA (Viterbi, 1967; Eppstein, 1998) … trees in a weighted forest (Jiménez & Marzal, 2000; Huang & Chiang, 2005) EM training Forward-backward EM (Baum/Welch, 1971; Eisner 2003) Tree transducer EM training (Graehl & Knight, 2004) Determinization … … of weighted string acceptors (Mohri, 1997) … of weighted tree acceptors (Borchardt & Vogler, 2003; May & Knight, 2005) Intersection WFSA intersection Tree acceptor intersection (despite CFG not closed) Applying transducers string WFST WFSA tree TT weighted tree acceptor Transducer composition WFST composition (Pereira & Riley, 1996) Many tree transducers not closed under composition (Rounds 70; Engelfriet 75) Tree Automata Toolkits for Used in NLP • Tiburon: Weighted tree automata toolkit – Developed by Jonathan May, USC/ISI – First version distributed in April 2006 – Includes tutorial – Inspired by string automata toolkits • www.isi.edu/licensed-sw/tiburon [May & Knight 06] Tree Automata Toolkits for Used in NLP % echo "A(B(C) B(B(C)))" | tiburon -k 1 - even.rtg three.rtg A(B(C) B(B(C))): 3.16E-9 % echo "A(B(C) B(C))" | tiburon -k 1 - even.rtg three.rtg Warning: returning fewer trees than requested 0 Back to the Outline • History of the World (of Automata for NLP) • Weighted string automata in NLP – Applications • • • • transliteration machine translation language modeling speech, lexical processing, tagging, summarization, optical character recognition, … – Generic algorithms and toolkits • Weighted tree automata in NLP – Applications – Generic algorithms and toolkits • Some connections with theory Desirable Properties of Transducer Formalism • Expressiveness – Can express the knowledge needed to capture the transformation & solve the linguistic problem • Modularity – Can integrate smaller components into bigger systems, co-ordinate search • Inclusiveness – Encompasses simpler formalisms • Teachability – Can learn from input/output examples Desirable Formal Properties of Transformation Formalism Modularity be closed under composition Inclusiveness Teachability capture any transformation that a string-based FST can given input/output tree pairs, find locally optimal rule probabilities in low-polynomial time Expressiveness see next few slides Expressiveness some necessary things for machine translation Phrasal Translation VP Non-constituent Phrases S está cantando VBZ VBG is singing PRO Non-contiguous Phrases VP hay X VP there VB VB X poner X PRT X on put are Context-Sensitive Word Insertion/Deletion NPB DT the Re-Ordering NP S X X Lexicalized Re-Ordering X Y X Z VP Y Z PP X P of Y Y X Expressiveness S S * NP CONJ VP DT N V the boy saw Local rotation wa[and] NP DT N the door S’ V NP NP ra’aa [saw] N N atefl [the boy] albab [the door] Desirable Formal Properties of Transformation Formalism Expressiveness Modularity Inclusiveness Teachability do local rotation be closed under composition capture any transformation that a string-based FST can given input/output pairs, find locally optimal rule probabilities in lowpolynomial time How do different tree formalisms fare? Top-down Tree Transducers every rule has this form one-level LHS multilevel RHS S x0 x1 S x2 x1 arabic verb arabic subject arabic object VP x0 LNT x2 T – top-down L – linear (non-copying) N – non-deleting Top-down Tree Transducers one-level LHS multilevel RHS S x0 x1 S x2 T – top-down L – linear (non-copying) N – non-deleting VP arabic verb arabic subject arabic object x0 LT x2 can delete subtrees Top-down Tree Transducers one-level LHS multilevel RHS S x0 x1 S x2 x0 arabic verb arabic subject arabic object x0 T T – top-down L – linear (non-copying) N – non-deleting VP x0 x2 can copy & delete subtrees Top-down Tree Transducers one-level LHS qS x0 T – top-down L – linear (non-copying) N – non-deleting multilevel RHS S x1 x2 r x1 arabic verb arabic subject arabic object s x0 T VP q x0 LT s x2 LNT all employ states T – top-down L – linear (non-copying) N – non-deleting T copying non-copying deleting LT non-deleting LNT T – top-down L – linear (non-copying) N – non-deleting T copying non-copying LT deleting non-deleting Expressiveness: * S PRO VP V S V NP PRO NP qS ? LNT x0 x1 T – top-down L – linear (non-copying) N – non-deleting qS T copying S x0 x1 r x1 q x0 s x1 r VP non-copying LT deleting q x0 x0 x1 s VP non-deleting Expressiveness: * S PRO VP V NP x0 S V PRO q x1 x1 NP qS ? LNT x0 x1 Extended (x-) Transducers multilevel LHS S S x0 english subject x1 VP x1 english verb T – top-down L – linear (non-copying) N – non-deleting x – extended LHS multilevel RHS x2 x0 x2 xLNT can grab more structure english object • possibility mentioned in [Rounds 70] • defined in [Graehl & Knight 04] • used for practical MT by [Galley et al 04, 06] copying T GS’84 non-copying deleting LT non-deleting LNT GS’84 xTR=TR xT GK’04 + finite-check before delete copying non-copying T GS’84 xLT LT deleting non-deleting xLNT GK’04 + local rotation LNT GS’84 Expressive power theorems in [Maletti, Graehl, Hopkins, Knight, submitted] xTR=TR Expressive enough for local rotation xT GK’04 copying non-copying T GS’84 xLT LT deleting non-deleting Expressiveness: * S PRO xLNT VP V V NP GK’04 S PRO NP LNT GS’84 Expressive power theorems in [Maletti, Graehl, Hopkins, Knight, to appear SIAM J. Comput] Expressive enough for local rotation xTR=TR xT GK’04 copying non-copying T GS’84 xLT LB=LTR LT deleting B non-deleting xLNT GK’04 LNT GS’84 bottom up transducers Expressive enough for local rotation xTR=TR Closed under composition xT GK’04 copying non-copying T GS’84 xLT LB=LTR LT deleting B non-deleting xLNT GK’04 LNT GS’84 bottom up transducers Inclusiveness • Tree transducers are described as generalizing FSTs (strings are “long skinny trees”) FST transition q q A/B A/*e* Equivalent tree transducer rule r r qA B x0 r x0 qA r x0 x0 q *e*/B r qx B rx But these transitions are not part of traditional tree transducers, which must consume a symbol at each step. Expressive enough Closed under composition Generalizes FST xTR=TR xT T copying non-copying B MBOT xLT LB=LTR LT deleting non-deleting xLNT (xRHS, input-e, output-e) xLNT xLNT LNT (xRHS, output-e) (xRHS, input-e) (xRHS, input-e, output-e) FST xLNT LNT LNT (xRHS, e-free) (xRHS, input-e) (xRHS, output-e) GSM LNT (xRHS, e-free) More Theory Connections • Other desirable properties – – – – More expressivity Other types of teachability Process trees horizontally as well as vertically Graph transduction • Papers: – – – – – – Overview of tree automata in NLP [Knight & Graehl 05] MT Journal [Knight 07] SIAM J. Comput. [Maletti et al, forthcoming] CIAA, e.g., Tiburon paper [May & Knight 06] WATA (Weighted Automata Theory and Applications) FSMNLP (Finite-State Methods and Natural Language Processing). Subworkshop: “Tree Automata and Transducers” (papers due 4/13/09) Conclusion • Weighted string automata for NLP – well understood and exploited • Weighted tree automata for NLP – just starting • Some connections with theory – of continuing interest • Good news from the empirical front – making good progress on machine translation
© Copyright 2024 ExpyDoc