4. RHYTHM, PROSODY, LANGUAGE

MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
4. RHYTHM,
PROSODY, TONE,
LANGUAGE
Science of Sound, Chapter 16
Springer Handbook of Acoustics, Chapter 16
RHYTHM
A STRIKING CHARACTERISTIC OF A FOREIGN LANGUAGE IS ITS RHYTHM.
ENGLISH, RUSSIAN, ARABIC AND THAI ARE STRESS-TIMED LANGUAGES. STRESSED
SYLLABLES RECUR AT APPROXIMATELY EQUAL INTERVALS. SYLLABLES MOST
OFTEN END WITH A CONSONANT.
FRENCH, SPANISH, GREEK, ITALIAN, YORUBA AND TELEGU ARE SYLLABLE TIME
LANGUAGES. SYLLABLES RECUR AT APPROXIMATELY EQUAL INTERVALS.
SYLLABLES OFTEN END WITH A VOWEL.
RHYTHMIC PATTERNS CAN BE USED TO SIGNAL DIFFERENCES IN SYNTACTIC
STRUCTURE. COMPARE:
1. The 2000-year-old skeletons
2. The two 1000-year-old skeletons
PROSODY
IN LINGUISTICS, PROSODY IS THE RHYTHM, STRESS, AND INTONATION OF SPEECH.
PROSODY MAY REFLECT VARIOUS FEATURES OF THE SPEAKER OR THE
UTTERANCE, THE EMOTIONAL STATE OF A SPEAKER, WHETHER THE UTTERANCE
IS A STEMENT, A QUESTION, OR A COMMAND; WHETHER THE SPEAKER IS BEING
IRONIC OR SARCASTIC; EMPHASIS, CONTRAST AND FOCUS.
IN TERMS OF ACOUSTICS, THE PROSODICS OF ORAL LANGUAGES INVOLVE
VARIATION IN SYLLABLE LENGTH, LOUDNESS, PITCH, AND THE FORMANT
FREQUENCIES OF SPEECH SOUNDS.
PROSODY IS OF GREAT INTEREST IN AUTOMATIC SPEECH RECOGNITION
DECLARATIVE, INTEROGATIVE, IMPERATIVE
DECALARATIVE: “You are going home”
INTEROGATIVE: “You are going home?” (voice is raised at end of sentence)
IMPERATIVE: “You ARE going home!” (are is emphasized)
EMOTIONAL STATE OF THE SPEAKER
PROSODIC FEATURES TEND TO INDICATE THE EMOTIONAL STATE OF THE SPEAKER.
“RAISING ONE’S VOICE “ IN ANGER, FOR EXAMPLE, INCREASES BOTH LOUDNESS AND
PITCH.
A STATE OF EXCITEMENT FREQUENCY CAUSES AN INCREASE IN THE RATE OF
SPEAKING. ATTEMPTS HAVE BEEN MADE TO ACCOMPLISH ACOUSTIC “LIE
DETECTION” BY ANALYZING THE PROSODIC FEATURES OF RECORDED SPEECH FOR
EVIDENCE OF STRESS
EFFECT OF EMOTION ON PHONATION FREQUENCY
PHONATION FREQUENCY vs TIME FOR THREE ACTORS SPEAKING THE SAME SENTENCE
(“For God’s sake!”) IN FOUR DIFFERENT MODES (Williams and Stevens 1972)
EFFECT OF EMOTION ON PHONATION FREQUENCY
MEDIAN AND RANGE OF THE PHONATION FREQUENCY FOR THREE ACTORS SPEAKING
THE SAME SENTENCE: S=SORROW; N=NEUTRAL; F=FEAR; A=ANGER
RADIO ANNOUNCER SPEAKING BEFORE (top) AND AFTER (bottom) THE CRASH OF
THE HINDENBURG DIRIGIBLE (1937)
STRESS
SPECTOGRAMS OF THE WORD
“SQUEAL” SPOKEN WITH FOUR
DEGREES OF STRESS IN RESPONSE
TO A LIST OF QUESTIONS
(Brownlee 1996)
TONE
IN SOME LANGUAGES, SUCH AS
CHINESE, A PHONEME CAN TAKE
ON DIFFERENT MEANINGS
DEPENDING ON ITS TONE.
THE FOUR TONES IN MANDARIN
CHINESE ARE SHOWN
VOICE QUALITY
VOICE QUALITY IS A BROAD TERM THAT REFERS TO THE EXTRALINGUISTIC ASPECTS OF A
SPEAKER’S VOICE WITH REGARD TO IDENTITY, PERSONALITY, HEALTH, AND EMOTIONAL
STATE.
VOCAL FOLD MASS, VOCAL TRACT LENGTH, TRACHEAL LENGTH, JAW AND TONGUE SIZE,
AND NASAL CAVITY VOLUME MAY INDICATE INFORMATION ABOUT AGE, SEX, PHYSIQUE,
AND HEALTH.
“High fidelity on the line: please say ‘ahh’”
THIS IS THE TITLE OF AN INTERESTING ARTICLE BY STEN TERNSTRÖM IN THE FALL 2008
ISSUE OF ECHOES.
SPECTRA OF SPEECH SOUNDS ARE ESPECIALLY RICH UP TO 4000 Hz, AND FALL OFF
RAPIDLY ABOVE 5000 Hz. BUT HIGH HARMONICS CAN BE MEASURED UP TO 20 kHz.
EARLY TELEPHONES TRANSMITTED ONLY 300-3500 Hz WITH LITTLE LOSS IN
INTELLIGIBILITY (SEE FILTERED SPEECH IN LESSON 3). IN 2000, A WIDE-BAND
STANDARD FOR TELEPHONY WAS DEFINED UP TO 7 000 Hz, A BIG IMPROVEMENT OVER
THE OLD “TELEPHONE SOUND.” HOPEFULLY CELL-PHONE SOUND WILL SOON SOUND
MUCH BETTER.
VOICES HEARD IN LIVE PERFORMANCE MAY SOUND A LITTLE “DULL” OF “FADED”
BEYOND THE 15TH ROW, BECAUSE HIGH FREQUENCIES ARE SLIGHTLY DIMINIISHED.
NORMAL, “YAWNY”, AND “TWANGY” VOICE
Story, Titze, and Hoffman (2001) did a 3-dimensional study of the vocal tract using MRI to
determine the shape when vowels /i/, /ae/, /α/, and /u/ were spoken with NORMAL,
“YAWNY”, and “TWANGY” voice.
Relative to NORMAL speech, the ORAL CAVITY is widened and the TRACT is lengthened for
YAWNY vowels. F1 and F2 moved closer together.
TWANGY vowels were characterized by shortened TRACT length, widened LIP OPENING,
and a slightly constricted ORAL CAVITY. F1 and F2 moved farther apart.
Story, Titze and
Hoffman, 2001)
Story, Titze
Hoffman, 2001)
ACCENTS
“TWO COUNTRIES SEPARATED BY A COMMON LANGUAGE”
Have you ever misunderstood someone or been misunderstood by someone who
speaks with a different accent? The sounds that an American hears as 'Bob the clerk'
may be heard by an Australian as 'barb the clock'.
The two most important parameters in determining different vowel sounds are the first two
formants, which are frequency bands with increased power. These are the two axes on the
graph. The axes are traditionally plotted backwards, as here, so that they approximately
correspond to the axes long used by phoneticians and linguists: F1 (vertical) approximately
corresponds to the jaw height (which correlates negatively with the extent of the mouth
opening). F2 (horizontal) approximately corresponds to the position (forward or back) of the
constriction of the vocal tract where the tongue is close to the roof of the mouth. Other
important parameters are the length of the vowel and other formants
F1 AND F2 FOR ENGLISH VOWEL SOUNDS SPOKEN BY AUSTRALIAN SPEAKERS
F1 CORRELATES WITH MOUTH OPENING; F2 CORRELATES WITH TONGUE PLACEMENT
AUSTRALIAN SPEAKER
For the Australians in this sample, the
words "hud" and "hard" have a
similar sound, the main difference is
the length. For this sample of
Americans, it is "hud" and "heard"
that are distinguished by length. For
an Australian, a long bud is a bard, for
an American, it's a bird.
AMERICAN SPEAKER
TO PARTICIPATE IN THIS SURVEY BY WOLFE, SMITH AND COLLEAGUES, CLICK ON
http://project.phys.unsw.edu.au/swe/survey/form.php