laSC22. Toward understanding the role of formant transitions for distinctions of stops from glides. Sally G. Revoile, Peggy B. Nelson, andLisa Holden-Pitt (GallaudetUniv., Ctr. for AuditoryandSpeechSci., laSC25. Quantitative measures of envelope cues in speech recognition. JohnnySaade (Dept.of Elec.Eng.,Univ.of California,Los Angeles and House Ear Inst., Los Angeles,CA 90057), Fan-Gang 800 FloridaAve., N.E., Washington, DC 20002) Zeng, JohnJ. Wygonski,RobertV. Shannon,Sigfrid D. Soli (HouseEar Inst., Los Angeles,CA 90057), and AbeerAlwan (Univ. of California, Los Angeles,CA) Our understandingis incompleteof the propertiesof vowel formant transitionsthat contributeto distinctionsof voiced stop and glide consonantsin speech.Researchappearsto have establishedsomeof the important transition cues for discernmentof bilabial syntheticstops versus glides. However, the stop/glidetransitionsstudiedhave typically been more stylized than those found in natural speech.This investigationex- aminedthe importance of transitions to listeners'identification of initial stopsand glidesin spoken/CVk/syllables. Performancewas assessed for the stopsand glideswith progressivedeletionof segmentsfrom the syllables'onsets.Bilabial and velar stopsand glidesas well as alveolarstops A quantitativeprocedureis derivedto evaluatethe relativecontribution of envelopecuesto speechrecognition.Recognitiondataof 16 consonants in the/aCa/form were collectedusing signal-correlated noise stimuli in sevennormal-hearinglisteners.Severaldistancemeasureswere calculated directlyfrom durationand amplitudeof the acousticenvelope.One amplitudedistancemeasurewasthe Euclideandistancewhichwascomputed from the squareddifferenceof the sample-by-sample amplitudes. The secondmeasurewas the envelopedifferenceindex (EDI) [Fortuneet al., Ear were tested in /Cuk/, /Cak/, /Ca:k/ contexts to examine differences in Hear.15, 93-95 (1994)] whichwascomputed fromthe absolute valueof transition use among phoneme environments.Twelve normal-hearing young adults participatedas listeners.In general, when the initial stop burstswere deleted,the F2 transitionfrequencyextentwas significantly correlatedwith subjects'consonantidentificationresponsepatterns.That is, longer F2 frequencyextentsyielded a higher percentageof glide responses.In addition, shorterF2 frequencyextentsresultedin a higher proportionof "no initial consonant"responses.Neither F2 transitionduration nor F 1 transitionduration/frequencyextent significantlycorrelated with the subjects'consonantidentifications. the differenceof the sample-by-sample amplitudes.A multidimensional scalinganalysiswas usedto convertthe perceptualconfusionmatrix into laSC23. The role of formant synchrony in the coherenceof vowels. Peter C. Gordonand Erika Manning (Dept. of Psych.,Univ. of North Carolina,ChapelHill, NC 27599-3270) The coherenceof vowelsas auditoryobjectswas studiedby comparing identification thresholds in noisefor syntheticvowel sounds(differingonly in the centerfrequencyof a singleformant)to identification thresholds for the distinctiveformantpresentedin isolation.The bandwidthof the noise maskerwas limited so that it only interferedwith perceptionof the distinctive formant.Thresholdsfor accuratelyidentifyingthe vowel sounds were lower than thosefor identifyingthe isolatedformant.This demonstrates that vowel sounds cohere in the sense that unmasked formants reducethe maskingof a formant embeddedin noise.The advantageof a completevowel over an isolatedformant appearsto dependon the temporal alignmentof the formants.When the onsetof the distinctiveformant coincides with the offset of the other formants, then listeners can still identify the vowel sound in modest amountsof noise. However, in this casethresholdsare not lower :forvowel identificationsthan for identifica- tions of isolatedformants.This indicatesthat temporalsynchronyplays a basicrole in the psychoacoustic coherenceof vowels. laSC24. Dynamic and static properties of imaged speech sounds. DeborahA. Gagnon (Moss Rehab. Res. Inst., 1200 W. Tabor Rd., Philadelphia, PA 19141) The type of informationstoredin memoryfor speechsoundswastested using a primed, speededclassificationtask. The relationshipbetween prime and targetwas varied in termsof phonemeconstituency, phoneme order,or both. Primeswere presentedeitherauditorallyor visually,allowing for a contrastbetweenperceptualand imagedspeechcodes.Two other manipulationswere made to assesswhether the temporal nature of the stimuli,the stimulusquality,or possiblyboth, play a role in determining imageability:(1) Stimulieithercontainedstop(dynamicallycued)or fricative (relativelystaticallycued)consonants; and (2) stimuliwere either naturalor synthetic.Inhibitoryeffectswere foundwhen an auditoryprime was presentedat a 100-msISI, supportingearlierevidencefor a positionally specificperceptualspeechcode(Gagnonand Sawusch,1992). It was alsofoundthatboththe manipulationof the type of consonant (stopversus fricative)presentin the targetand the qualityof the stimulusset (natural versussynthetic) hadaneffecton imageability, supporting botha temporal nature(Surprenant,1992) and stimulusqualityaccountof imageability. These results will be discussed within the context of current theories of memory,imagery,and speechperception.[Work supportedby NIDCD Grant R01 DC00219 to SUNY at Buffalo and Mark Diamond Research Fundgrantto DeborahA. Gagnon.] 3244 J. Acoust.Soc. Am., Vol. 97, No. 5, Pt. 2, May 1995 a distance matrix and to normalize the different distance measures. Corre- lation coefficientswere computedbetweenthe differentdistancemeasures andthe perceptualdata.Preliminaryanalysisof datafrom six stopconsonantsshowedthat the consonant durationaloneis sufficientto explainthe perceptual data(r=0.92). AlthoughEuclideandistanceconveyedlessinformation(r-0.75) than duration,it was a better measurethan the EDI (r =0.31). Evaluationof thesemeasures onthe full 16 consonant setwill be discussed. laSC26. Onset-sensitivetime-frequencymasking and its application to speech recognition. Kiyoaki Aikawa (ATR Human Information Process.Res.Labs.,2-2 Hikaridai, Seika-cho,Soraku-gun,Kyoto, 619-02 Japan) This paper proposes an onset-sensitivetime-frequencymasking mechanismin orderto improvedynamicfeatureextraction.Applicationof the proposedmechanismto Japanese23-phonemerecognitionusinghiddenMarkov modelsdemonstrated that onset-sensitive MASP outperforms time-invariantMASP. Masked Spectrum(MASP) [Aikawa et al., Proc. ICASSP93II, 668-671 (1993)]is a new spectralrepresentation incorporatingtime-frequencyforwardmaskingand has beenreportedto provide excellentperformancewhen used for speaker-dependent and speakerindependentspeechrecognition.The maskingpatternproductionmechanismwaspreviouslymodeledby a time-invarianttime-frequency filter,but the maskinglevel risesat the onsetsand offsetsin a speechsoundIT Hirahara,J. Acoust.Soc. Jpn. El2 (2), 57-68 (1991); E. Miyasaka,J. Acoust.Soc.Jpn.39 (9), 614-623 (1983)]. This phenomenon suggests that an adaptivemaskingmechanismis effectivefor balancinginstantaneousand transitionalspectralfeaturesdependingon vowels or consonants. The masking pattern is calculatedas the weighted sum of the smoothedprecedingspectraobtainedby time-distance-dependent spectral smoothinglifters. The maskinglevel is controlledby the slope of the temporalcontourof the instantaneous soundenergy.The maskedspectrum is obtainedby subtractingthe maskingpatternfrom the currentspectrum. Onset-offset-sensitivemaskingmodelsare also examined. laSC27. Difference limens for vowel-vowel formant transitions. William A. Ainsworth (Dept. of Commun.and Neurosci.,Keele Univ., Keele, Staffordshire ST5 5BG, UnitedKingdom) Second-formant transitionsin vowel-vowel utterancesare not always of the samedurationas thoseof the first formantandthey oftenbegin and end at differentinstants.In othercasesthe formantfrequenciessometimes first move in a different direction from their final targets.In order to investigatewhethertheseformantmovementsare perceptuallysignificant, a number of difference limens for formant transitions have been measured for synthesized versionsof the vowel pair/a/-/i/. It was foundthat differencesin durationbetweenthe first- andsecond-formant transitionsof up to 70 ms were not perceived.It was alsofoundthat delaysbetweenthe starts and ends of the first and secondtransitionsof up to 50 ms were not perceived.These resultssuggestthat the differencesin durationsand delays betweenthe first and secondformantsfound in naturalvowel-vowel utterances are unlikelyto be of perceptualsignificance. [Work supported by EC ScienceContractSC1-CT92-0786.] 129th Meeting:AcousticalSocietyof America 3244 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 128.200.124.93 On: Mon, 14 Jul 2014 23:15:51
© Copyright 2024 ExpyDoc