ASR Automatic Speech Recognition Dictionaries Data Experiments Is it good enough Automatic phonetic transcription for Danish ASR Andreas Søeborg Kirkedal AT&T & CBS [email protected] 31. januar 2014 ASR Automatic Speech Recognition Outline 1 ASR 2 Automatic Speech Recognition What happens? Focus 3 Dictionaries Automatic transcription 4 Data Text Speech Systems 5 Experiments Kaldi ASR systems 6 Is it good enough Critique Dictionaries Data Experiments Is it good enough ASR Automatic Speech Recognition What happens? This is what happens Dictionaries Data Experiments Is it good enough ASR Automatic Speech Recognition Focus The topic of today Dictionaries Data Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Data Focus Overview AM PD LM Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Data Focus Overview AM AFVIST BIAVL FORÆRE TRETTEN PD A w b-0 f V t-h v i: s d-0 i A w l E: 6 R { d-0 @ n LM Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Data Focus Overview b-0 F 6 AM AFVIST BIAVL FORÆRE TRETTEN FORÆRE BIAVL PD A w b-0 f V t-h v i: s d-0 i A w l E: 6 R { d-0 @ n LM Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Cost barrier Large Out-of-vocabulary errors Domain Expensive Pronunciation variants Quality Genre Read-aloud speech Dictation Data Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Automatic transcription Automatic transcription Automatic transcription is not as good as manual Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Automatic transcription Automatic transcription Automatic transcription is not as good as manual Not always Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Automatic transcription Automatic transcription Automatic transcription is not as good as manual Not always But it is Cheaper Quicker Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Automatic transcription Automatic transcription Automatic transcription is not as good as manual Not always But it is Cheaper Quicker Can it be good enough? Is it good enough ASR Automatic Speech Recognition Dictionaries Data Automatic transcription eSpeak Open Source TTS engine Downloadable Multiple languages supported (ca. 50) Advanced transcription Includes suprasegmental features Variant of IPA Quality Accuracy is very important Maturity of language specied Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Data Automatic transcription eSpeak [Duddington, 2010] Rewriting system (shallow) Spelling-to-phoneme rules Case-by-case rules Developed by several people Quality depends on the developer Danish native speaker (current) Language professional Linguistics background? Complexity 8600 Spelling-to-phoneme rules 11000 Word/word-class rules Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Automatic transcription Phonix [Henrichsen, 2014] Linguistic preprocessing Deep morphological analysis Decomposition Several fall-back strategies Several knowledge bases Dictionaries Letter-to-sound rules KBs must agree on transcription Data Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Automatic transcription Phonix [Henrichsen, 2014] Linguistic preprocessing Deep morphological analysis Decomposition Several fall-back strategies Several knowledge bases Dictionaries Letter-to-sound rules KBs must agree on transcription You just heard everything in more detail anyway Is it good enough ASR Automatic Speech Recognition Dictionaries Text Data Proprietary General Commands Names Spelling Numbers Medical diagnoses Public Danish parliament (Folketinget) Data Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Text Preparation 1 2 Uniform encoding Cleaning up text Remove duplicates, e.g: '4' : 're' 3 4 5 Handle abbreviations Tokenisation Expand dates and numbers Data Experiments Is it good enough ASR Automatic Speech Recognition Speech Mirsk parallel data 16kHz Mono Genre Read-aloud speech Dictation Gender Accent and Dialect Age 350 speakers Dictionaries Data Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Systems Text databases 1 2 1 2 3 Mirsk 3gram LM Mirsk 4gram LM eSpeak dictionary Phonix dictionary Combination dictionary Map from one phonetic alphabet to the other Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Systems Multiple transcriptions tRAdn tRAt@n Tretten (13) Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Systems Multiple transcriptions totusEnVtol tywetol 2012 Is it good enough ASR Automatic Speech Recognition Systems Phone mapping { & ?& { : & ?& @ @ 3 2 ?W W 2 : ?W W 6 V 6: V: 9 W 9: W a & A A Dictionaries Data Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Kaldi ASR systems Training 1 2 3 4 5 Train monophone system Create triphone system from monophone system Optimise systems with dierent metrics (WER/MMI) Use dierent combinations of LMs Use additional features Kaldi ASR toolkit [Povey et al., 2011] Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Is it good enough Kaldi ASR systems Baselines eSpeak System LM 3gram mono 4gram 3gram tri1 4gram 3gram tri2a 4gram 3gram tri2b 4gram 3gram tri3b 4gram tri4a 3gram WER 50.33 50.16 27.25 26.27 24.72 24.25 22.88 22.60 25.76 25.48 21.06 Phonix System LM 3gram mono 4gram 3gram tri1 4gram 3gram tri2a 4gram 3gram tri2b 4gram 3gram tri3b 4gram tri4a 3gram WER 60.09 60.09 37.80 36.92 33.77 31.13 ASR Automatic Speech Recognition Dictionaries Data Kaldi ASR systems Combination Conguration WER System LM eSpeak Combination 3gram 50.33 50.21 mono 4gram 50.16 50.23 3gram 27.25 26.92 tri1 4gram 26.27 3gram 24.72 25.82 tri2a 4gram 24.25 3gram 22.88 23.16 tri2b 4gram 22.60 3gram 25.76 20.52 tri3b 4gram 25.48 tri4a 3gram 21.06 18.55 Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Is it good enough Critique Comparison Genre Dictation task is more dicult Dictation closer to PE scenario DK SOA ASR: 50-20% WER [Molgaard et al., 2007] Best system: 18.55% WER Commercial system WER See e.g. META-NET white paper [Pedersen et al., 2012] ASR Automatic Speech Recognition Critique Medical dictation More dicult task LM accuracy Very large vocabulary Low error tolerance World knowledge Medical history Treatment codes Dictionaries Data Experiments Is it good enough ASR Automatic Speech Recognition Dictionaries Data Experiments Is it good enough Critique Might be good enough for ... Post-editing Speech-to-speech translation Access to additional knowledge Pass n-best list to MT system Leverage information in translation probabilities to choose the right translation ASR Automatic Speech Recognition Dictionaries Data Experiments Is it good enough Critique Duddington, J. (2010). espeak text to speech. Henrichsen, P. (2014). Phonix: Danish grapheme-to-phoneme transcriber. To appear in Translation in transition: between cognition, computing and technology, 1. Molgaard, L., Jorgensen, K., and Hansen, L. K. (2007). Castsearch-context based spoken document retrieval. In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, volume 4, pages IV93. IEEE. ASR Automatic Speech Recognition Dictionaries Data Experiments Is it good enough Critique Pedersen, B. S., Wedekind, J., Bøhm-Andersen, S., Henrichsen, P. J., Hoensetz-Andresen, S., Kirchmeier-Andersen, S., Kjærum, J. O., Larsen, L. B., Maegaard, B., Nimb, S., Rasmussen, J.-E., Revsbech, P., and Thomsen, H. E. (2012). Det danske sprog i den digitale tidsalder The Danish Language in the Digital Ag META-NET White Paper Series. Georg Rehm and Hans Uszkoreit (Series Editors). Springer. Available online at http://www.meta-net.eu/whitepapers. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., and Vesely, K. (2011). The kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society. IEEE Catalog No.: CFP11SRW-USB.
© Copyright 2025 ExpyDoc