Nicolas Pécheux , Li Gong , Quoc Khanh Do , Benjamin

LIMSI @ WMT’14 Medical Translation Task
1,2
1,2
1,2
2,3
2,4
Nicolas Pécheux , Li Gong , Quoc Khanh Do , Benjamin Marie , Yulia Ivanishcheva ,
1,2
1,2
2
1,2
2
Alexandre Allauzen , Thomas Lavergne , Jan Niehues , Aurélien Max , François Yvon
(1) Université Paris-Sud, (2) LIMSI-CNRS, (3) Lingua et Machina, (4) Centre Cochrane français
Highlights
Systems
• Subtask of sentence translation from summaries, English → French
In what circumstances do granulomatous and eosinophilic gastritis occur?
What are the etiologies of dysphagia in gastroesophageal reflux disease?
Ncode — bilingual
n-gram approach to
SMT
• Successful approach that makes use of two flexible translation systems
VSM — Vector space model to perform domain adaptation
MIRA
Data sources
Corpus
Tokens (en)
weight
Coppa
Emea
Pattr-Abstracts
Pattr-Claims
Pattr-Titles
UMLS
Wikipedia
10M
6M
20M
32M
3M
8M
17k
-3
26
22
6
4
-7
-5
NewsCommentary
Europarl
Giga
4M
54M
260M
6
-7
27
all
397M
33
• Combining both data sources drastically boosts performance
Devel
medical
WMT’13
both
42.2± 0.1
43.0± 0.1
48.3± 0.1
OTF — on-the-fly estimation of
the parameters of a standart phrasebased model
Soul — Continous space models working on top of conventional lan∗
guage models (reranking); adapted language model (LM )
Test
SysComb — Combination of both systems (reranking)
39.6± 0.1
41.0± 0.0
45.4± 0.0
Devel
BLEU scores obtained by Ncode
Part-of-Speech Tagging
Proxy Test Set
• Medical data exhibit different syntactic constructions and a specific vocabulary
• Only a small development set is available (500 sentences)
• This makes both system design and tuning challenging
• We use a specific model trained on medical data
PoS tagging
Devel
Test
Standard
Specialized
47.9± 0.0
48.3± 0.1
44.8± 0.1
45.4± 0.0
• We created an internal dev/test set (LmTest) by extracting
sentences from Pattr-Abstracts
Devel
LmTest
NewsTest12
Test
48.3± 0.1
41.8± 0.2
39.8± 0.1
46.8± 0.1
48.9± 0.1
37.4± 0.2
26.2± 0.1
18.5± 0.1
29.0± 0.1
45.4± 0.0
40.1± 0.1
39.0± 0.3
Test
Ncode
+ Soul LM∗
∗
+ Soul LM + TM
48.5
49.8
50.1
45.2
45.9
47.0
OTF
+ VSM
∗
+ Soul LM
∗
+ Soul LM + TM
46.6
46.9
48.4
49.7
42.5
42.8
44.2
44.9
SysComb
50.7
46.5
• Ncode outperforms OTF by 2.8 BLEU points
• Vector space model does not yield here any improvement
• Continous space language models yield gains of up to 2 BLEU points
• System combination gain does not transfer to the test set
Conclusions
• Moderate to high-quality translations
Error Analysis
• Lack of an internal test challenging
extra
SysComb
OTF+VSM+Soul
missing
incorrect
unknown
word
content
filler
disamb.
form
style
term
order
word
term
all
4
4
13
4
20
31
47
44
62
82
8
6
18
20
21
42
1
3
11
12
205
248
Manual error analysis following (Vilar et al., 2006) for the first 100 test sentences.
• More careful integration of medical terminology proved necessary