Quantitative measures of envelope cues in speech

laSC22. Toward understanding the role of formant transitions for
distinctions of stops from glides. Sally G. Revoile, Peggy B. Nelson,
andLisa Holden-Pitt (GallaudetUniv., Ctr. for AuditoryandSpeechSci.,
laSC25. Quantitative measures of envelope cues in speech
recognition. JohnnySaade (Dept.of Elec.Eng.,Univ.of California,Los
Angeles and House Ear Inst., Los Angeles,CA 90057), Fan-Gang
800 FloridaAve., N.E., Washington,
DC 20002)
Zeng, JohnJ. Wygonski,RobertV. Shannon,Sigfrid D. Soli (HouseEar
Inst., Los Angeles,CA 90057), and AbeerAlwan (Univ. of California,
Los Angeles,CA)
Our understandingis incompleteof the propertiesof vowel formant
transitionsthat contributeto distinctionsof voiced stop and glide consonantsin speech.Researchappearsto have establishedsomeof the important transition cues for discernmentof bilabial syntheticstops versus
glides. However, the stop/glidetransitionsstudiedhave typically been
more stylized than those found in natural speech.This investigationex-
aminedthe importance
of transitions
to listeners'identification
of initial
stopsand glidesin spoken/CVk/syllables. Performancewas assessed
for
the stopsand glideswith progressivedeletionof segmentsfrom the syllables'onsets.Bilabial and velar stopsand glidesas well as alveolarstops
A quantitativeprocedureis derivedto evaluatethe relativecontribution
of envelopecuesto speechrecognition.Recognitiondataof 16 consonants
in the/aCa/form were collectedusing signal-correlated
noise stimuli in
sevennormal-hearinglisteners.Severaldistancemeasureswere calculated
directlyfrom durationand amplitudeof the acousticenvelope.One amplitudedistancemeasurewasthe Euclideandistancewhichwascomputed
from the squareddifferenceof the sample-by-sample
amplitudes.
The secondmeasurewas the envelopedifferenceindex (EDI) [Fortuneet al., Ear
were tested in /Cuk/, /Cak/, /Ca:k/ contexts to examine differences in
Hear.15, 93-95 (1994)] whichwascomputed
fromthe absolute
valueof
transition use among phoneme environments.Twelve normal-hearing
young adults participatedas listeners.In general, when the initial stop
burstswere deleted,the F2 transitionfrequencyextentwas significantly
correlatedwith subjects'consonantidentificationresponsepatterns.That
is, longer F2 frequencyextentsyielded a higher percentageof glide responses.In addition, shorterF2 frequencyextentsresultedin a higher
proportionof "no initial consonant"responses.Neither F2 transitionduration nor F 1 transitionduration/frequencyextent significantlycorrelated
with the subjects'consonantidentifications.
the differenceof the sample-by-sample
amplitudes.A multidimensional
scalinganalysiswas usedto convertthe perceptualconfusionmatrix into
laSC23. The role of formant synchrony in the coherenceof vowels.
Peter C. Gordonand Erika Manning (Dept. of Psych.,Univ. of North
Carolina,ChapelHill, NC 27599-3270)
The coherenceof vowelsas auditoryobjectswas studiedby comparing
identification
thresholds
in noisefor syntheticvowel sounds(differingonly
in the centerfrequencyof a singleformant)to identification
thresholds
for
the distinctiveformantpresentedin isolation.The bandwidthof the noise
maskerwas limited so that it only interferedwith perceptionof the distinctive formant.Thresholdsfor accuratelyidentifyingthe vowel sounds
were lower than thosefor identifyingthe isolatedformant.This demonstrates that vowel
sounds cohere
in the sense that unmasked
formants
reducethe maskingof a formant embeddedin noise.The advantageof a
completevowel over an isolatedformant appearsto dependon the temporal alignmentof the formants.When the onsetof the distinctiveformant
coincides with the offset of the other formants, then listeners can still
identify the vowel sound in modest amountsof noise. However, in this
casethresholdsare not lower :forvowel identificationsthan for identifica-
tions of isolatedformants.This indicatesthat temporalsynchronyplays a
basicrole in the psychoacoustic
coherenceof vowels.
laSC24. Dynamic and static properties of imaged speech sounds.
DeborahA. Gagnon (Moss Rehab. Res. Inst., 1200 W. Tabor Rd.,
Philadelphia,
PA 19141)
The type of informationstoredin memoryfor speechsoundswastested
using a primed, speededclassificationtask. The relationshipbetween
prime and targetwas varied in termsof phonemeconstituency,
phoneme
order,or both. Primeswere presentedeitherauditorallyor visually,allowing for a contrastbetweenperceptualand imagedspeechcodes.Two other
manipulationswere made to assesswhether the temporal nature of the
stimuli,the stimulusquality,or possiblyboth, play a role in determining
imageability:(1) Stimulieithercontainedstop(dynamicallycued)or fricative (relativelystaticallycued)consonants;
and (2) stimuliwere either
naturalor synthetic.Inhibitoryeffectswere foundwhen an auditoryprime
was presentedat a 100-msISI, supportingearlierevidencefor a positionally specificperceptualspeechcode(Gagnonand Sawusch,1992). It was
alsofoundthatboththe manipulationof the type of consonant
(stopversus
fricative)presentin the targetand the qualityof the stimulusset (natural
versussynthetic)
hadaneffecton imageability,
supporting
botha temporal
nature(Surprenant,1992) and stimulusqualityaccountof imageability.
These results will be discussed within
the context of current theories of
memory,imagery,and speechperception.[Work supportedby NIDCD
Grant R01 DC00219
to SUNY
at Buffalo
and Mark
Diamond
Research
Fundgrantto DeborahA. Gagnon.]
3244
J. Acoust.Soc. Am., Vol. 97, No. 5, Pt. 2, May 1995
a distance matrix and to normalize the different distance measures. Corre-
lation coefficientswere computedbetweenthe differentdistancemeasures
andthe perceptualdata.Preliminaryanalysisof datafrom six stopconsonantsshowedthat the consonant
durationaloneis sufficientto explainthe
perceptual
data(r=0.92). AlthoughEuclideandistanceconveyedlessinformation(r-0.75) than duration,it was a better measurethan the EDI
(r =0.31). Evaluationof thesemeasures
onthe full 16 consonant
setwill be
discussed.
laSC26. Onset-sensitivetime-frequencymasking and its application
to speech recognition. Kiyoaki Aikawa (ATR Human Information
Process.Res.Labs.,2-2 Hikaridai, Seika-cho,Soraku-gun,Kyoto, 619-02
Japan)
This paper proposes an onset-sensitivetime-frequencymasking
mechanismin orderto improvedynamicfeatureextraction.Applicationof
the proposedmechanismto Japanese23-phonemerecognitionusinghiddenMarkov modelsdemonstrated
that onset-sensitive
MASP outperforms
time-invariantMASP. Masked Spectrum(MASP) [Aikawa et al., Proc.
ICASSP93II, 668-671 (1993)]is a new spectralrepresentation
incorporatingtime-frequencyforwardmaskingand has beenreportedto provide
excellentperformancewhen used for speaker-dependent
and speakerindependentspeechrecognition.The maskingpatternproductionmechanismwaspreviouslymodeledby a time-invarianttime-frequency
filter,but
the maskinglevel risesat the onsetsand offsetsin a speechsoundIT
Hirahara,J. Acoust.Soc. Jpn. El2 (2), 57-68 (1991); E. Miyasaka,J.
Acoust.Soc.Jpn.39 (9), 614-623 (1983)]. This phenomenon
suggests
that an adaptivemaskingmechanismis effectivefor balancinginstantaneousand transitionalspectralfeaturesdependingon vowels or consonants. The masking pattern is calculatedas the weighted sum of the
smoothedprecedingspectraobtainedby time-distance-dependent
spectral
smoothinglifters. The maskinglevel is controlledby the slope of the
temporalcontourof the instantaneous
soundenergy.The maskedspectrum
is obtainedby subtractingthe maskingpatternfrom the currentspectrum.
Onset-offset-sensitivemaskingmodelsare also examined.
laSC27.
Difference
limens
for
vowel-vowel
formant
transitions.
William A. Ainsworth (Dept. of Commun.and Neurosci.,Keele Univ.,
Keele, Staffordshire
ST5 5BG, UnitedKingdom)
Second-formant
transitionsin vowel-vowel utterancesare not always
of the samedurationas thoseof the first formantandthey oftenbegin and
end at differentinstants.In othercasesthe formantfrequenciessometimes
first move in a different direction from their final targets.In order to
investigatewhethertheseformantmovementsare perceptuallysignificant,
a number of difference
limens for formant transitions have been measured
for synthesized
versionsof the vowel pair/a/-/i/. It was foundthat differencesin durationbetweenthe first- andsecond-formant
transitionsof up to
70 ms were not perceived.It was alsofoundthat delaysbetweenthe starts
and ends of the first and secondtransitionsof up to 50 ms were not
perceived.These resultssuggestthat the differencesin durationsand delays betweenthe first and secondformantsfound in naturalvowel-vowel
utterances
are unlikelyto be of perceptualsignificance.
[Work supported
by EC ScienceContractSC1-CT92-0786.]
129th Meeting:AcousticalSocietyof America
3244
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 128.200.124.93 On: Mon, 14 Jul 2014 23:15:51