Description of the Czech Lexical Phonological Corpus [last updated 19/1/2015] 1 Phonological Analysis The Czech Lexical Phonological Corpus (henceforth: CzPhLexCorp) consists primarily of lexemes and their phonological representation. The lexemes have been phonologically transcribed in accord with the phonological analysis of Modern Czech presented in the book Phonotactics of Czech by Aleš Bičan (Peter Lang, 2013). The mentioned book contains more details and information. A phonological description accounts for two types of properties of linguistic data: segmental phonetic properties and suprasegmental phonetic properties. A phonological theory provides phonological models referring to sets of these properties. Three phonological models are relevant here: phoneme, phonotagm, para-phonotactic feature. The phoneme has three equivalent definitions: a) an unordered set of distinctive features; b) a minimal phonotactic entity; c) a set of allophones. It accounts for segmental phonetic properties of speech sounds. See 1.1. The phonotagm is defined as a self-contained (autonomous, complete) combination of phonemes. One of the phonemes is the nuclear element; the others are peripheral or nonnuclear (and usually optional) elements within the combination. Phonotagms roughly account for segmental phonetic properties of syllables. See 1.2. The para-phonotactic features are additional phonological properties which either distinguish one phonological form from other another (e.g. tones in tone languages) or determine the groupment of phonotagms (e.g. accent group). They roughly account for suprasegmental phonetic properties of utterances. See 1.3. 1.1 Phonemes of Czech According to the function within phonotagms, three classes of phonemes are recognized for Czech: vowels, consonants and semiconsonants. 1.1.1 Vowels Vowels are phonemes which are always nuclear; the classification of the Czech vowels according to distinctive features together with the range of their allophones is given in table 1. The diphthongs are interpreted as single phonemes (see Bičan 2013: 37–40). Front Short Long Diphthongal high /i/ [ɪ] /ī/ [iː] mid /e/ [ɛ] /ē/ [ɛː] /ë/ [ɛu̯] Central /a/ [a] /ā/ [aː] /ä/ [au̯] Back high /u/ [u] /ū/ [uː] mid /o/ [o] /ō/ [oː] /ö/ [ou̯] Table 1: Vowels of Czech with their allophones 1 1.1.2 Consonants The consonants are phonemes which are always peripheral (non-nuclear); the classification of the consonants of Czech according to distinctive features together with the range of their allophones is given in table 2. Labial Alveolar Palatal v’less v’ed v’less v’ed v’less v’ed /p/ /b/ /t/ /d/ /ť/ /ď/ Occlusive [p] [b] [t] [c] [d] [ɟ] [c] [ɟ] /f/ /v/ /s/ /z/ /š/ /ž/ Fricative [f] [v] [s] [z] [ʃ] [ʒ] /m/ /n/ /ň/ Nasal [m] [ɱ] [n] [ɲ] [ŋ] [ɲ] Outside the proportional system: /j/ [j] and /ř/ [r̝] [r̝̝̊] Velar v’less v’ed /k/ /g/ [k] [ɡ] /x/ /h/ [x] [ɦ] Table 2: Consonants of Czech in the context of relevance and their allophones 1.1.2.1 Affricates The affricates [t͜s], [d͜z], [t͜ʃ] and [d͜ʒ] are interpreted as combinations of two phonemes (see Bičan 2013: 33–7 for the defense of this analysis). Their interpretation is shown in table 3. They are subject to neutralization (see 1.1.4) Affricate In contexts of relevance Phonological interpretation In contexts of Example neutralization Example [t͜s] /Ts/ /TseSta/ cesta /TS/ /peTS/ pec /klaTSki/ klacky [d͜z] /Tz/ /Tzinkaťi/ dzinkati /TS/ /leTSKgo/ leckdo [t͜ʃ] /Tš/ /TšaS/ čas /TŠ/ /mīTŠ/ míč /poTŠti/ počty [d͜ʒ] /Tž/ /Tžem/ džem /TŠ/ /lēTŠba/ Table 3: Interpretation of the affricates The affricates are distinguished from the sequences of pre-alveolar stops (realized with an explosion) and pre-alveolar or post-alveolar fricatives [t.s], [d.z], [t.ʃ], [d.ʒ] (cf. práce × prát se, počít × podšít). The stop-fricative sequences are in this case interpreted as signals of boundaries because this pronunciation is invariably found across grammatical boundaries only (see 1.3.2). 1.1.3 Semiconsonants Semiconsonants are phonemes which can be both nuclear entities (like vowels) and peripheral entities (like consonants). Two such phonemes are recognized for Czech: /r/ realized as syllabic [r̩] and non-syllabic [r], and /l/ realized as syllabic [l̩ ] and non-syllabic [l]. 2 1.1.3.1 Notation of nuclear /r/ and /l/ For the sake of convenience, the nuclear semiconsonants are transcribed /R/, /L/ in CzPhLexCorp (e.g. /pLnī/ plný). 1.1.3.2 Syllabic [m̩] Syllabic [m̩] sometimes pronounced in sedm, osm and similar words is interpreted as a realization of the sequence /um/ because it is in free variation with [um], hence /sedum/ and /osum/. 1.1.4 Neutralization and archi-phonemes Neutralization is contextual irrelevancy of a difference between two or more phonemes which is relevant (distinguishes the meaning) in other contexts (= contexts of relevance). The phonemes occurring in contexts of neutralization are archi-phonemes defined as the intersection of two or more phonemes (i.e. the phoneme between which the neutralization takes place). Two neutralizations are recognized for Czech: neutralization of voicing, and neutralization of the place of articulation for nasals. 1.1.4.1 Neutralization of voicing The neutralization of voicing is contextual irrelevancy of the difference between the voiceless and voiced consonants. It results in voicing archi-phonemes; their classification together with the range of realization is given in table 4. Labial Alveolar Palatal /P/ /T/ /Ť/ Occlusive [p] [b] [t] [d] [c] [ɟ] /F/ /S/ /Š/ Fricative [f] [v] [s] [z] [ʃ] [ʒ] /m/ /n/ /ň/ Nasal [m] [ɱ] [n] [ɲ] [ŋ] [ɲ] Outside the proportional system: /j/ [j] and /ř/ [r̝], [r̝̝̊] Velar /K/ [k] [ɡ] /X/ [x] [ɦ] Table 4: Consonants of Czech in the context of neutralization of voicing with their allophones Neutralization of voicing takes place: 1) At the end of phonological words1: /peS/ pes, /beS-ūTšelnī/ bezúčelný; 2) Before a voiceless or voiced consonant (with the exception of /v/): /StāT/ stát, /Sdar/ zdar (cf. /svāT/ svát × /zvāT/ zvát); 3) Before a voicing archi-phoneme: /FSpřīmiT/ vzpřímit, /xePSkī/ chebský; 4) Before /ř/ if the latter is followed by a voiceless or voiced consonant (with the exception of /v/): /Třťina/ třtina, /XřbeT/ hřbet (cf. /břve/ Břve); 5) Before /ř/ if the latter occurs at the end of a phonological word: /pePř/ pepř, /dovniTř/ dovnitř. The voicing archi-phonemes are phonologically neither voiceless nor voiced. Their phonetic/realizational voicing is completely predictable from the context they occur in and is thus non-phonological. 1 See 1.3.2 on the notion phonological word. 3 1.1.4.2 Neutralization of the place of articulation of nasals The neutralization of the place of articulation of nasals is contextual irrelevancy of the difference between /m/, /n/ and /ň/. It results in the nasal archi-phoneme /M/ always realized as [m]. The neutralization is a consequence of the fact that no nasal other than [m] is found in certain contexts making thus the place of articulation completely predictable there. Neutralization of the place of articulation of nasals takes place: 1) Phonotagm-initially before any consonant: /Mdlo/ mdlo, /MhöřiT/ mhouřit; 2) Phonotagm-initially before any semiconsonant (either nuclear or non-nuclear): /sMlöva/ smlouva, /zMRznöT/ zmrznout; 3) Across phonotagms in between a consonant and a nuclear semiconsonant: /posMRtnī/ posmrtný, /odMLTšeT/ odmlčet. 1.2 Phonotagms Phonotagms are self-contained combinations of phonemes. In Czech a phonotagm is usually realized as a single syllable except for some special cases: /Stārl/, /karl/, /tirl/ are all single phonotagms (the final /l/ is not nuclear; see Bičan 2013: 140ff.) 1.3 Para-phonotactic features In Czech para-phonotactic features determine the groupment of phonotagms. Two such groupments are recognized for Czech: phonological word and accent group. 1.3.1 Accent group The accent group is a group of phonotagms gathered together by features of accent. Accent groups are determined by internal coherence and melodic contour (Palková 2013). The same sequence of phonotagms (syllables) may correspond to different accent groups the difference between which is determined by the melodic contour (e.g. od ní mají × odnímají). In CzPhLexCorp the concept of accent group is accordance with the theory of Zdena Palková, and the groupment of phonotagms into accent groups follows her rules formulated and applied to automatic speech synthesis of Czech (Palková 2004). 1.3.2 Phonological word The phonological word is recognized as a constituent of accent groups in the case of occurrence of the glottal stop and other boundary-marking phenomena. The notion is important in order to account for the differences such as the following ones: [potʔokɛm] pod okem [potokɛm] potokem [fʔaktɛx] v aktech [faktɛx] faktech [pot.ʃiːt] podšít [pot͜ʃ] počít [vrtɛx] v rtech [vr̩tɛx] vrtech All of the examples correspond to single accent groups, but the left-hand examples are interpreted as corresponding to two phonological words. For more details see Bičan (2014). 4 1.3.3 Notation Although the corpus is phonological, and accent groups and phonological words are phonological units, several types of boundary-signaling symbols are used there so as easily determine to which grammatical units the accent groups and phonological words correspond. See table 5 for the notation used. Corresponds grammatically to the boundary of word or stem in compound words Symbol Marks the boundary of + accent group = phonological word word - phonological word morpheme _ . orthographical word phonotagm boundary Example /hoďiT+sebö/ hodit sebou /TšeSko+slovenSkī/ česko-slovenský /nuďiT=se/ nudit se /poT-ūředňīK/ podúředník word /pēTsi_se/ péci se – /po.lēF.ka/ polévka Table 5: Boundary-signaling marks used in the corpus 2 Structure of the CzPhLexCorp 2.1 Format CzPhLexCopr is available in the Comma Separated Value format (.csv). This format allows for storing tabular data in plain-text form. It can be opened and edited in applications designed for editing .csv file or imported in editors such as Microsoft Excel, Microsoft Access etc.2 The data are separated by the separator “;”, i.e. the semicolon. Once opened or imported, the data will be displayed as a table. 2.2 Heading The first row of the table is the heading. Table 6 shows a division of the heading into five sections. The sections are reproduced in tables 7–11. The other rows contain the data. See also Bičan (ms. 1). Orthographic form Phonological representation Phonological properties Parts of speech Occurrence in dictionaries Table 6: Sections of the heading 2 See http://office.microsoft.com/en-001/excel-help/import-or-export-text-txt-or-csv-files-HP010342598.aspx for explanation how to import .csv to MS Excel. 5 Ortho Table 7: Rows in the section Orthographic form PhRep Syllab Table 8: Rows in the section Phonological representation Length Phtagms CVStr Place Manner Voicing Horiz Vertic Quant Table 9: Rows in the section Orthographic form PoS Table 10: Rows in the section Parts of speech SSČ SSJČ PSJČ SN1 SN2 CSN SPr SVaz FSČ ASCS VSČ Table 11: Rows in the sections Occurrence in dictionaries 2.3 Columns The content of the respective columns is described below. Standard editors such as MS Excel allow searching and filtering the data according to criteria specified for each column.3 2.3.1 Column Ortho This column lists lexical items in their standard orthographic form. 2.3.2 Column PhRep This column provides the phonological representation of the lexical items, i.e. the phonological form (PhF) of the item. The transcription follows the analysis outlined in section 1. 2.3.3 Column Syllab This column provides the syllabification of the lexical items. Syllabification rules are stored in external files and can be freely replaced with alternative rules. At the moment, CzPhLexCorp follows the rules proposed by Kučera – Monroe (1968). 2.3.4 Column Length This column contains numbers corresponding to the length of a PhF. The length equals to the number of phonemes within the PhF. Boundary-signaling symbols are not counted. 2.3.5 Column Phtagms This column contains numbers corresponding to the number of phonotagms within a PhF. It equals the number of nuclear phonemes within the PhF. 3 See http://office.microsoft.com/en-001/excel-help/filter-data-in-a-range-or-table-HP010073941.aspx on filtering data in MS Excel. 6 2.3.6 Column CVStr This column reproduces the structure of a PhF according to the membership of constituent phonemes into the class of peripheral entities and nuclear entities. The symbols used are given in table 12. Boundary-signaling symbols are not included in the representation. Symbol C V W Stands for consonants and non-nuclear semiconsonants vowels nuclear semiconsonants Table 12: Symbols used in column CVStr 2.3.7 Column Place This column reproduces the structure of a PhF according to the place of articulation of constituent consonants (see table 2). The symbols used are given in table 13. The category “place” is not relevant for vowels and nuclear semiconsonants, and they are thus transcribed with minuscule letters. Symbol L A P K I v w Stands for labials alveolars palatals velars isolated consonants and semiconsonants (/M/, /j/, /ř/, /r/, /l/) vowels nuclear semiconsonants Table 13: Symbols used in column Place 2.3.8 Column Manner This column reproduces the structure of a PhF according to the manner of articulation of constituent consonants (see table 2). The symbols used are given in table 14. The category “manner” is not relevant for vowels and nuclear semiconsonants, and they are thus transcribed with minuscule letters. Symbol O F N R v w Stands for occlusives fricatives nasals sonants (/j/, /ř/, /r/, /l/) vowels nuclear semiconsonants Table 14: Symbols used in column Manner 2.3.9 Column Voicing This column reproduces the structure of a PhF according to the voicing of constituent consonants (see table 2). The symbols used are given in table 15. The category “voicing” is not rel- 7 evant for vowels and nuclear semiconsonants, and they are thus transcribed with minuscule letters. Symbol U Z X v w Stands for voiceless voiced indifferent (nasals and voicing archi-phonemes) vowels nuclear semiconsonants Table 15: Symbols used in column Voicing 2.3.10 Column Horiz This column reproduces the structure of a PhF according to the horizontal axis of constituent vowels (see table 3). The symbols used are given in table 16. The category “horizontal axis” is not relevant for consonants and semiconsonants, and they are thus transcribed with minuscule letters. Symbol Q E B c w Stands for front central back consonants and non-nuclear semiconsonants nuclear semiconsonants Table 16: Symbols used in column Horiz 2.3.11 Column Vertic This column reproduces the structure of a PhF according to the vertical axis of constituent vowels (see table 3). The symbols used are given in table 17. The category “vertical axis” is not relevant for consonants and semiconsonants, and they are thus transcribed with minuscule letters. The category is also not relevant for non-high and non-mid vowels; they are transcribed with a minuscule letter, too. Symbol H M v c w Stands for high mid non-high and non-mid vowels (/a/, /ā/, /ä/, /ë/, /ö/) consonants and non-nuclear semiconsonants nuclear semiconsonants Table 17: Symbols used in column Vertic 2.3.12 Column Quant This column reproduces the structure of a PhF according to the quantity (length) of constituent vowels (see table 3). The symbols used are given in table 18. The category “quantity” is not relevant for consonants and semiconsonants, and they are thus transcribed with minuscule letters. 8 Symbol S G D c w Stands for short long diphthongal consonants and non-nuclear semiconsonants nuclear semiconsonants Table 18: Symbols used in column Quantity 2.3.13 Column PoS This column provides information about the part of speech of a given entry. The digits corresponds to a particular part of speech; see table 19. Symbol 1 2 3 4 5 6 7 8 9 0 Stands for nouns adjectives pronouns numerals verbs adverbs prepositions conjunctions particles onomatopoeia Table 19: Symbols used in column PoS 2.3.14 Columns for dictionaries The remaining columns mark whether the lexical item is recording in the dictionaries and databases of Czech. The zero (0) means the item is not recorded; the digit 1 means that it is recorded. The list of the dictionaries currently included in the corpus is given in table 20. Abbreviation Stands for SSČ Slovník spisovné češtiny (4th edition, 2005) SSJČ Slovník spisovného jazyka českého (2nd edition, 1989) PSJČ Příruční slovník jazyka českého (1935–1957) Co v slovnících nenajdete (Novinky v současné slovní zásobě) CSN (1994) SN1 Nová slova v češtině. Slovník neologizmů 1 (1998) SN2 Nová slova v češtině. Slovník neologizmů 2 (2004) Slovesa pro praxi. Valenční slovník nejčastějších českých SPr sloves (1997) Slovník slovesných, substantivních a adjektivních vazeb a spoSVaz jení (2005) FSČ Frekvenční slovník češtiny (2004) ASCS Akademický slovník cizích slov A-Ž (1995) VSČ Výslovnost spisovné češtiny (1978) Table 20: Dictionaries included in the main Corpus 9 3 Sub-corpora 3.1 Municipalities and their parts This corpus consists of names of the Czech municipalities and their parts (as to 31/12/2013). It is structured similarly like the main corpus. See table 21 for its heading. Ortho PhRep Syllab Length Phtagms CVStr Place Manner Voicing Horiz Vertic Quant Type Table 21: The heading of the Municipalities sub-corpus The content of all columns except for Type is the same as that of the main corpus. Column Type marks whether the item in column Ortho is a name of a municipality (M) or of its part (P). 3.2 Given names and their hypocorisms This corpus consists of a list of the 785 most common Czech given names and their hypocorisms (in total 5,724 entries). It is structured similarly like the main corpus. See table 22 for its heading. Ortho PhRep Syllab Length Phtagms CVStr Place Manner Voicing Horiz Vertic Quant Type Sex Table 22: The heading of the Given Names sub-corpus The content of all columns except for Type and Sex is the same as that of the main corpus. Column Type marks whether the item in column Ortho is a basic form of a given name (B) or a hypocorism (H). Column Sex marks whether it is a name of a male (M) or a female (F). 3.3 Czech botanical names This corpus consists of a list of 2,549 Czech botanical names. It is structured similarly like the main corpus. See table 23 for its heading. The content of all columns is the same as that of the main corpus. Column Type contains only the value “botanical”. Ortho PhRep Syllab Length Phtagms CVStr Place Manner Voicing Horiz Vertic Quant Type Table 23: The heading of the Botanical Names sub-corpus 3.4 Czech zoological names This corpus consists of a list of 30,469 Czech zoological names. It is still updated and enlarged. It is structured similarly like the main corpus. See table 24 for its heading. Ortho PhRep Syllab Length Phtagms CVStr Place Manner Voicing Horiz Vertic Quant Type Table 24: The heading of the Zoological Names sub-corpus 10 The content of all columns except for Type is the same as that of the main corpus. Column Type provides information about the zoological class of a given name. See table 25 for a list of the classes included. The corpus is going to be supplemented with names of other classes. Type FCC mammals fish Stands for fungi, corrals, ctenophores mammals fish Table 25: Zoological names included in the Zoological Names corpus 4 Evaluation files This section describes the content of the evaluation files and lists abbreviations used therein. 4.1 Abbreviations used Table 26 provides a list of entities used in evaluation files. It is them that evaluations (statistics etc.) are done for. Type Pgm/DiGto Pgm/DiGty DiGto DiGty AcGto AcGty Wto Wty Stands for phonotagm in phonological words as tokens phonotagm in phonological words as types phonological word (diaereme group) as a token phonological word (diaereme group) as a type accent group as a token accent group as a type orthographical word as a token orthographical word as a type Table 26: Types of entities in evaluation files Table 27 provides a list of abbreviations used for various types of phonemes. Type Voc Con Sem NuS PeS NucP PerP Stands for vowels consonants semiconsonants nuclear semiconsonants peripheral (non-nuclear) semiconsonants nuclear phonemes (vowels and nuclear semiconsonants) peripheral phonemes (consonants and non-nuclear semiconsonants) Table 27: Types of phonemes in evaluation files Table 28 provides a list of abbreviations used for various types of phonemes. 11 Type PerComb PreComb PostComb InterComb PhoPho CC CV VC VV Stands for peripheral combinations, i.e. combinations of peripheral phonemes pre-nuclear peripheral combinations post-nuclear peripheral combinations inter-nuclear peripheral combinations combinations of two phonemes combinations of two peripheral phonemes combinations of a peripheral phoneme and a nuclear phoneme combinations of a nuclear phoneme and a peripheral phoneme combinations of two nuclear phonemes Table 28: Types of phoneme combinations in evaluation files Table 29 provides a list of other abbreviations used. Type Occ Rat Stands for absolute occurrence ratio of occurrence, i.e. relative occurrence Table 29: Other abbreviations used in evaluation files Absolute occurrence is counted as the total occurrence of an entity within a given corpus. Relative occurrence (= ratio of occurrence) is counted as absolute occurrence of an entity divided by total occurrence of all entities. 4.2 Contents of the evaluation folder The evaluation folder contains a number of files each of which provides a different output of the evaluation file with different data. 4.2.1 Main folder The following files are included in the main folder. Except for overview.txt, all are .csv files and contain tabular data. In all of them the first row contains a heading and the first column contains a list of entities for which the values are counted. See above for the abbreviations used. cc-pgm.csv A list of combinations of two PerP attested at any position within Pgm and their absolute and relative occurrence. e.g. /tr/ in /koS.tra/ cc.csv A list of combinations of two PerP attested at any position within DiG and their absolute and relative occurrence. e.g. /vl/ in /vlāda/ 12 cv-pattern-acg.csv A list of consonantal-vocalic patterns for AcG and their absolute and relative occurrence. C = a PerP V = a NucP e.g. /nuďiT=se/ has the pattern CVCVCCV cv-pattern-dig.csv A list of consonantal-vocalic patterns for DiG and their absolute and relative occurrence. e.g. /maTka/ has the pattern CVCCV cv-pattern-pgm.csv A list of consonantal-vocalic patterns for Pgm and their absolute and relative occurrence. e.g. /maT/ in /maT.ka/ has the pattern CVC cv-pattern-w.csv A list of consonantal-vocalic patterns for W and their absolute and relative occurrence. e.g. /se/ in /nuďiT=se/ has the pattern CV cv-pgm.csv A list of combinations of a PerP + a NucP attested within Pgm and their absolute and relative occurrence. e.g. /ma/ in /maT.ka/ cv.csv A list of combinations of a PerP + a NucP attested at any position within DiG and their absolute and relative occurrence. e.g. /da/ in /nuda/ dig-nucp-env-class-occ-ty.csv Absolute occurrence of classes of NucP in various environments. The environments are listed in table 30. Type # V CV C# CombV Comb# Stands for at the end of DiG before a PerP before a - boundary before a PerP followed by NucP at the end of DiG before a single PerP before a combination of PerP followed by a NucP at the end of DiG before a combinaiton of PerP Table 30: Environments of phoneme occurrence dig-nucp-env-class-rat-ty.csv Relative occurrence of classes of NucP in various environments (see table 30). dig-nucp-env-ph-occ-ty.csv Absolute occurrence of NucP in various environments (see table 30). dig-nucp-env-ph-rat-ty.csv Relative occurrence of classes of NucP in various environments (see table 30). 13 dig2-seq-class-occ-ty.csv Absolute occurrence of classes of NucP in various phonotagms attested within DiG. M = non-final phonotagm F = final phonotagm digit = position of the phonotagm from the beginning (e.g. M1 = first medial phonotagm) dig2-seq-class-rat-ty.csv Relative occurrence of classes of NucP in various phonotagms attested within DiG; see dig2seq-class-occ-ty.csv for abbreviations. dig2-seq-ph-occ-ty.csv Absolute occurrence of NucP in various phonotagms attested within DiG; see dig2-seq-classocc-ty.csv for abbreviations. dig2-seq-ph-rat-ty.csv Relative occurrence of NucP in various phonotagms attested within DiG; see dig2-seq-classocc-ty.csv for abbreviations. inter-csv A list of inter-nuclear combinations attested within DiG and their absolute and relative occurrence. e.g. /St/ in /paSta/ length-acg.csv Distribution of phonemic length in AcG and the absolute and relative occurrence of AcG of a given phonemic length. Phonemic length = number of phoneme in a unit. length-dig.csv Distribution of phonemic length in DiG and the absolute and relative occurrence of DiG of a given phonemic length. e.g. /peS/ has the phonemic length of 3 length-pgm.csv Distribution of phonemic length in Pgm and the absolute and relative occurrence of AcG of a given phonemic length. e.g. /ma/ in /ma.so/ has the phonemic length of 2 length-w.csv Distribution of phonemic length in W and the absolute and relative occurrence of AcG of a given phonemic length. e.g. /se/ in /nuďiT=se/ has the phonemic length of 2 overview.csv An overview and summary of the results. It contains additional results sorted into several categories. ph-classes.csv Frequency of phoneme classes. The classes are defined according to distinctive features of phonemes. 14 phonemes.csv Frequency of individual phonemes and their absolute and relative occurrence. Symbol “lL” stands for the phoneme /l/ and symbol “rR” for the phoneme /r/. In the corpora, symbols “l” and “r” are used for non-nuclear /l/ and /r/, and symbols “L” and “R” for nuclear /r/ and /l/. phonotagm-acg.csv Distribution of AcG according to the number of Pgm and the absolute and relative occurrence of AcG of a given number of Pgm. e.g. /poT=uTšitel/ contains 4 Pgm phonotagm-dig.csv Distribution of DiG according to the number of Pgm and the absolute and relative occurrence of DiG of a given number of Pgm. e.g. /uTšitel/ contains 3 Pgm phopho-pgm.csv A list of combinations of any two phonemes attested at any position within Pgm and their absolute and relative occurrence. e.g. /sl/ in /slo.vo/ phopho.csv A list of combinations of any two phonemes attested at any position within DiG and their absolute and relative occurrence. e.g. /ov/ in /slovo/ post-pgm.csv A list of post-nuclear combinations attested within Pgm and their absolute and relative occurrence. e.g. /ST/ in /pro.paST/ post.csv A list of post-nuclear combinations attested within DiG and their absolute and relative occurrence. e.g. /rKT/ in /infarKT pre-pgm.csv A list of pre-nuclear combinations attested within Pgm and their absolute and relative occurrence. e.g. /pr/ in /pro.paST/ pre.csv A list of pre-nuclear combinations attested within DiG and their absolute and relative occurrence. e.g. /Ml/ in /MloK/ vc-pgm.csv A list of combinations of a NucP + a PerP attested within Pgm and their absolute and relative occurrence. e.g. /aS/ in /pro.paST/ 15 vc.csv A list of combinations of a NucP + a PerP attested within DiG and their absolute and relative occurrence. e.g. /oS/ in /živoST/ vv.csv A list of combinations of two NucP attested within DiG and their absolute and relative occurrence. e.g. /ao/ in /xaoS/ 4.2.2 Subfolder quantity-pattern-dig This subfolder contains files with quantity patterns for phonological words (DiG) of different number of Pgm (the digit = the number of Pgm). S = short vowel G = long vowel D = diphthongal vowel W = nuclear semiconsonant 4 References Bičan, Aleš. 2013. Phonotactics of Czech. Peter Lang. Bičan, Aleš. 2014. “K pojmu fonologické slovo v češtině”. Sophia Slavica (eds. Vít Boček – Bohumil Vykypěl), 13–23. Brno: Tribun EU. Bičan, Aleš. ms. 1. “Fonologický lexikální korpus a jeho analýza”. Submitted. Draft available here: <http://www.ujc.cas.cz/phword>. Kučera, Henry – Monroe, George. 1968. A Comparative Quantitative Phonology of Russian, Czech, and German. Elsevier. Palková, Zdena. 2004. “The set of phonetic rules as a basis for the prosodic component of an autonomous TTS synthesis in Czech”. Phonetica Pragensia X.33–46. Palková, Zdena. 2013. “Prozodické vlastnosti češtiny ve vztahu k mezislovnímu sandhi”. Studie k moderní mluvnici češtiny 5, K české fonetice a fonologii, 106–118. Univerzita Palackého v Olomouci. 16
© Copyright 2024 ExpyDoc