d6.1 vocal tract replicas and acoustic mea- surements.

D6.1 VOCAL TRACT REPLICAS AND ACOUSTIC MEASUREMENTS.
Xavier Pelorson and R´emi Blandin and Annemie Van Hirtum and Xavier Laval
Gipsa-lab, 11 rue des Math´ematiques, Grenoble Campus, Saint Martin d’H`eres, France
e-mail: [email protected],e-mail: [email protected], e-mail: [email protected],
email : [email protected]
The goal of this work is to provide accurate and extensive acoustic measurements on various
vocal tract replicas in order to validate the numerical simulations performed in WP5. This
task required first, the development of a specific measurement set-up to acquire the acoustic
pressure inside vocal tract replicas at specific positions or over a whole surface using a 3D stage
positioning system. Then, the optimization of each element of this set-up as well as the post
processing of the acquired data was the second major challenge. In close collaboration with
WP5, measurements have been performed for vocal tract replicas of increasing complexity.
The comparisons with the numerical simulations performed in WP5 was complemented with
comparisons using theoretical predictions, obtained from simple acoustical theory.
Version
Date
WP number
WP leader
WP leader email
V1
05/03/2014
6
Xavier Pelorson
[email protected]
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
1
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
Contents
1
Introduction
3
2
Experimental setup
2.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Vocal tract replicas and acoustic excitation . . . . . . . . . . . . . . . . . . . . . . .
2.3 Acoustic pressure modulus and phase estimation . . . . . . . . . . . . . . . . . . .
3
3
3
5
3
Theory
3.1 Plane wave theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 One tube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Two tubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
6
6
4
Transfer function measurement
4.1 Transfer function measurement method . . . . . . . . . . . . . . . . . . . .
4.2 Problems encountered when trying to perform measurements near the source
4.3 Measured transfer functions . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 One tube replica . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 Two tubes replica . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3 Vowels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
8
8
8
9
9
10
12
5
Reflection coefficient estimation
5.1 The two microphone method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Reflection coefficient estimation from experimental data . . . . . . . . . . . . . . .
12
12
13
6
Surface measurement
6.1 One tube replica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Two tube replica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
14
14
7
Conclusion
18
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
REFERENCES
18
A Article : three dimensional vocal tract acoustics
19
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
2
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
1.
Introduction
A very commonly used geometrical approximation of the vocal tract consists in a succession
of tubes having different cross sections and sharing the same axis.This approximation implicitely
assumes plane wave propagation. However, the human vocal tract is not perfectly axisymmetric
and transverse modes could be generated and involved in the propagation of sound in the frequency
range of interest for speech production. An approximation consisting of a succession of tubes, taking
the eccentricity of vocal tract and the transverse propagation modes into account may be a better
approximation.
To investigate to what extent the plane wave propagation is accurate, measurements of transfer
functions and pressure patterns at a given frequency inside the vocal tract replicas are performed.
Several replicas of increasing complexity are studied. As a particularly illustrative example, two
replicas constituted of a succession of two tubes, one with the two tubes sharing the same axis and
the other with different axis, are compared. These measurements are compared to the plane wave
acoustic theory and with Finite Element (FEM) simulations.
The experimental setup used to perform the measurements is first presented. Then, the simple
acoustic theory for a one tube and a two tubes geometry is introduced. Afterwards, the transfer
function estimation method is explained and the experimental results are presented and discussed. An
estimation of the reflection coefficient for a one tube replica is described and the experimental data
is compared with theory. Eventually, the presentation of the surface measurements allows to confirm
the observations and the assumptions made from the transfer function measurements.
2.
Experimental setup
2.1
Setup
To measure the acoustic pressure inside a vocal tract replica, an experimental setup is used.
It is composed of an acoustic source , a probe microphone (B&K type 4182 with a 200 mm long
and 1 mm wide probe) moved by a 3-axis positioning system (OWIS PS35), an anechoic room [4]
(1.92x1.95x1.99 m, Vol = 7.45 m3 ) (see figure 2a). A BNC board connects the electrical signals to
a computer containing a data acquisition card (PCI-MIO 16XE)(see figure 1). Data acquisition is
controlled using Labview.
The positioner is used to measure the pressure in various locations inside and outside of the
vocal tract replica. The source allows to generate sinusoidal signals at given frequencies. The setup is
placed in the anechoic room and acoustic foam is placed under the screen to avoid reflection effects.
Temperature is measured for each experiment with a thermometer placed in the anechoic room.
For a given position of the probe or frequency of the source, the pressure and the source input
voltage signals are recorded during about 1 s. This allows to compute the modulus and phase of the
pressure.
2.2
Vocal tract replicas and acoustic excitation
Six different vocal tract replicas, of increasing geometrical complexity, have been constructed
for this study :
• a simple uniform straight tube of dimensions 29.5 × 170mm (see figure 3a).
• a two-tube cascade with dimensions 14 × 85mm for the first tube and 29.5 × 85mm for the
second one. Two geometrical configurations have been considered. In the first one, the centered configuration, both tubes share the same axis of revolution while in the second case, the
eccentered configuration, the axis of revolution is different (see figure 3b and 3c).
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
3
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
Figure 1: Experimental setup.
(a) 3D positioning system inside the annechoic room.
(b) Connection between the source and the replica.
Figure 2: Experimental setup.
• three 3D printings of vocal tract geometries corresponding to the vowel /a/, /i/ and /u/ have
been realised in rigid acrylic. These geometries have been taken from the litterature [3] (see
figure 4).
The acoustic excitation of the replicas was realised using compression chambers. In order to
cover the full frequency range of interest for speech (i.e. up to 10 kHz), two compression chambers
were used: a Monacor KU-916T for the lowest frequency range (50 Hz - 2kHz) and an Eminence
PSD:2002S-8 for the highest frequencies. In order to prevent for acoustic interferences, the sound
source was located outside of the anechoic room. The connection between the sound source and the
vocal tract replicas was performed using and adaptation part (see figure 2b). The acoustic excitation
of the replicas was therefore radiated through a hole of 1mm diameter.
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
4
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
(a) One tube
(b) Two centric tubes
(c) Two eccentric tubes
Figure 3: One and two tubes replicas.
(a) Vowel /a/ replica
(b) Vowel /i/ cut view
Figure 4: Vowels replicas.
2.3
Acoustic pressure modulus and phase estimation
Despite all the care used during the measurements, the recorded signals can be altered by some
spurius phenomena. The presence of noise cannot be excluded and harmonic distortion of the acoustic
source is not avoidable. On the second hand, the signals can have a continuous component and
transient phenomena can be present when the frequency changes and when the source start to generate
sound.
To avoid all of these artefacts, a careful signal processing is performed. The first 200 ms are
removed to avoid transient phenomena then the Fourier transform of the signal is computed. The
spectrum amplitude is normalised by multiplying it by 2/N (N being the number of samples of the
analysed signal). The Fourier transform is computed using zero-padding to get a frequency resolution
lower than 0.1 Hz.
The maximum of the spectrum is searched on a frequency band centered on the supposed signal frequency. The frequency and the phase corresponding to this maximum are then extracted. A
parabolic interpolation is performed on the 3 closest points of maximum to get a better estimation of
the amplitude (the point having the maximal amplitude is not necessarily the maximum of the Fourier
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
5
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
Figure 5: Diagram of a junction between two simple tubes.
transform).
3.
Theory
3.1
Plane wave theory
3.1.1
One tube
The wave wave field produced by a source located in a single uniform tube can be described
at any abscissa x by an acoustic pressure and a flow of the following form (a time factor e−jωt is
understood throughout this part) :
P = A(e−jkx + Rejkx )
(1)
U = Zc−1 A(e−jkx − Rejkx )
Where A is an amplitude factor, R is a reflection coefficient, k = 2πf /c is the wave number (f
being the frequency and c the sound speed) and Zc = ρc/S is the characteristic impedance (ρ being
the air density and S the tube cross section area). At the exit (abscissa x = 0) of the tube the radiation
impedance Zr gives the following boundary condition :
P (0)
1+R
= Zr = −1
U(0)
Zc (1 − R)
(2)
This equation gives the expression of the reflection coefficient R :
R=
Zr /Zc − 1
Zr /Zc + 1
(3)
At the the source (x = xs ) , the following condition is satisfied :
U(xs ) = Us
(4)
Where Us is the amplitude of the acoustic source. This leads to the expression of A :
Us Zc
− Rejkxs
The viscothermal losses can be taken into account using a complex wavenumber.
A=
3.1.2
e−jkxs
(5)
Two tubes
The wave field produced by a source located in a segmented two tube waveguide can be described at any abscissa x by the following equations :
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
6
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice

P1 = A1 (e−jkx + R1 ejkx )



−1
U1 = Zc1
A1 (e−jkx − R1 ejkx )
P2 = A2 (e−jkx + R2 ejkx )



−1
U2 = Zc2
A2 (e−jkx − R2 ejkx )
(6)
Where indice 1 refer to tube 1 and indice 2 to tube 2 (see figure 5). The continuity of pressure
and the conservation of the acoustic flow on the junction (at the abscissa x = 0) give two equations :
A1 [1 + R1 ] = A2 [1 + R2 ]
S1 A1 [1 − R1 ] = S2 A2 [1 − R2 ]
(7a)
(7b)
Where S1 and S2 are the cross section surfaces of tube 1 and 2. Adding 7a to 7b and dividing
by S1 gives a relationship between A1 and A2 :
1
S2
A1 = A2 1 + R2 + (1 − R2 ) = CA2
(8)
2
S1
Dividing 7a by 7b leads to a relationship between R1 and R2 (which is equivalent to writing the
equality of the impedances on both sides of the junction) :
R1 =
S1 (1 + R2 ) − S2 (1 − R2 )
S1 (1 + R2 ) + S2 (1 − R2 )
(9)
The reflection coefficient R2 can be found thanks to the boundary condition at the exit :
P2 (l2 )
= Zr
U2 (l2 )
(10)
Thus :
R2 = e−2jkl2
Zr /Zc2 − 1
Zr /Zc2 + 1
(11)
At the source, assuming that the source is located inside the tube 1, the following condition is
satisfied :
−1
U1 (xs ) = Zc1
A1 (e−jkxs − R1 ejkxs ) = Us
(12)
Where Us is the amplitude of the source flow. This leads to the following expression for the
amplitude factor A1 :
A1 =
Us Zc1
− R1 ejkxs
e−jkxs
(13)
If the source is located inside the tube 2 the following relationship is verified :
A2 =
Us Zc2
− R2 ejkxs
e−jkxs
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
(14)
7
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
Figure 6: Transfer function measurement method.
4.
Transfer function measurement
4.1
Transfer function measurement method
To measure the transfer function between two points inside a vocal tract replica a frequency
step sweep method is used. A sinusoidal signal is generated during a fixed amount of time for each
frequency at which ones desire to know the transfer function value.
A single microphone is used so that no absolute calibration is necessary. The measurement
is made in two stages. The pressure sa is firstly measured for each frequency at the first point of
coordinate a then it is measured at the second point of coordinate b (see figure 6). During each
measurement the supply voltage s0 is measured at the same time in order to have a phase reference.
So the acquired signals are :
s0 = A0 ejφ0 and sa = Aa ejφa
(15)
s0 = A0 ejφ0 and sb = Ab ejφb
Both transfer functions H0a and H0b between the supply voltage and the pressure at the measurement points are then estimated. To achieve this the amplitude of the signal measured by the
microphone is divided by the supply voltage amplitude to compute the modulus. The phase is obtained by computing the phase shift between the signal measured by the microphone and the supply
voltage. So the transfer functions H0a and H0b are :
H0a =
H0b =
Aa j(φa −φ0 )
e
A0
Ab j(φb −φ0 )
e
A0
(16)
The transfer function Hab between the measurement points a and b is obtained as the ratio
H0b /H0a . The transfer function H0a corresponds to the product of the transfer functions of the
acoustic source, the propagation of sound from the source to the point xa , the probe, the microphone
and the microphone conditioner. If the experimental conditions are exactly the same for the measurement of transfer function H0a and H0b , the transfer function H0b is the product of transfer function
H0a by the transfer function Hab which ones wants to know. Thus we have :
H0b
(17)
H0a
The whole measurement system transfer function is thus eliminated with this computation. This
method though a bit heavy gives quality results because enough energy is supplied for each frequency
to make a good measurement.
Hab =
4.2
Problems encountered when trying to perform measurements near the source
Transfer function estimation has been firstly performed between the entrance and the exit of
the one tube theory. The measurement points were very close to the communication hole between
the source and the tube and on the exit of the tube. The frequency range has been chosen below the
cutoff frequency of the first transverse mode (about 7000 Hz) so the planar mode was expected to
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
8
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
−30
|P| (dB)
−35
−40
−45
−50
0.015
0.01
0.01
0.005
0.008
0
0.006
−0.005
0.004
−0.01
0.002
−0.015
y (m)
0
x (m)
Figure 7: Pressure field measured on a 20mm × 10mm surface containing the tube axis just in front
of the communication hole (located on y = 0 and x = 0) for a source frequency of 2548 Hz. Non
planar evanescent modes due to radiation of the source through the communication hole inside the
tube can be seen.
be predominant. However the transfer function obtained were not in agreement with the plane wave
theory. These differences are due to the fact that evanescent non planar modes are not negligible at
the tube extremities.
A pressure measure on a surface (20mm × 10mm) perpendicular to the tube axis just in front of
the communication hole shows that the plane wave assumption is not valid at this place (see figure 7).
One can see that close to the source the pressure perturbation can be important over a short distance.
Even if the theory takes evanescent non planar modes into account, errors due to probe location
uncertainty remains critical. So the neighborhood of the communication hole has been avoided to
perform transfer function estimation.
4.3
4.3.1
Measured transfer functions
One tube replica
Three transfer functions have been measured with the previously described setup and method.
A 160 wat Sphynx SP-DYN-PRO2 acoustic source and a type 4182 B&K microphone with a 200
mm long and 1 mm wide probe have been used. The duct used was 170 mm long and had an internal
diameter of 30 mm. It ended in a 300 mm wide and 400 mm long screen. The pressure has been
measured at 3 points labeled 1,2 ans 3 respectively located at 120 mm, 80 mm and 40 mm from the
duct entrance (see figure 8) at frequencies varying from 2 kHz to 10 kHz by steps of 50 Hz.
The modulus and phase of the transfer function measured between points 1 and 3 is presented
in figure 9. The theoretical transfer functions have been computed with a plane wave theory assuming
that the duct ends in an infinite screen. Viscothermal losses are taken into account. The experimental
results show a good agreement with theory.
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
9
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
Figure 8: Measurement points locations in the one tube replica.
15
|H13| (dB)
10
5
0
−5
−10
2000
3000
4000
5000
6000
f (Hz)
7000
8000
9000
10000
3000
4000
5000
6000
f (Hz)
7000
8000
9000
10000
0
φ (deg)
−200
−400
−600
−800
−1000
2000
Figure 9: Modulus and phase of the transfer function between two points located at 120 mm and 40
mm from the entrance of a duct which is 170 mm long and has an internal diameter of 30 mm. The
dots have been obtained by measurement and the line is computed from the theory.
4.3.2
Two tubes replica
Six transfer functions have been measured on both centric and eccentric two tube replicas.
The measurements have been performed in two stages because the source used for generating high
frequencies (above 2000 Hz) cannot be used for low frequencies. So the transfer functions have first
been measured between 2000 Hz and 10000 Hz with a high frequency source (Eminence PSD:2002S8) and then between 100 Hz and 2000 Hz with a low frequency source (Monacor KU-916T).
The pressure has been measured at 4 points labeled 1,2,3 and 4 (see figure 10 and 11). The
transfer functions between these points have then been computed. As an example the one between
points 1 and 2 is presented in figure 12.
As one can see, the transfer functions are quite similar at low frequency (up to about 5000 Hz).
This is no more the case at high frequency. The most noticeable difference is the presence of maxima
(as an example at 7220 Hz and 7910 Hz) and minima (as an example at 7060 Hz and 7500 Hz) above
7000 Hz for the eccentric case which does not appear in the other case.
This difference is due to the fact that in the eccentric configuration the non planar propagation modes are excited whereas they are almost non existent in the other configuration. When the
frequency is higher than the non planar cutoff frequency they can propagate and generate other resonances than the plane wave resonances. This is the reason of the presence of additional maxima in
the transfer function. The results obtained from FEM simulation are in agreement with these meaEUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
10
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
Figure 10: Dimensions and measurement points locations for the two tubes centric replica.
Figure 11: Dimensions and measurement points locations for the two tubes eccentric replica.
|H12| (dB)
20
0
−20
−40
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
f (Hz)
φ(H12) (deg)
5
centric
eccentric
0
−5
−10
−15
0
1000
2000
3000
4000
5000
f (Hz)
6000
7000
8000
9000
10000
Figure 12: Modulus and phase of the transfer function between points 1 and 2 (located at 30 mm and
90 mm from the source) of two tube replicas, (see figure 10 and 11).
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
11
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
Figure 13: Measurement points locations in vowels replicas.
surements and confirms this difference of behaviour between both configurations. For a comparison
between experiment and FEM simulations the reader is referred to WP5 deliverable (D5.1 Simulation
and Validation of VT sound with static geometries).
4.3.3
Vowels
Transfer function measurements have been performed on 3D printed vocal tract replicas. These
replicas are a concatenation of cylinders corresponding respectively to vowels /a/, /i/ and /u/ (the area
functions have been taken from [3]). All the cylinders share the same central axis.
Three transfer functions between these points have then been computed in the same way as the
one used for the two tube replicas. The acoustic sources used are also the same. The pressure has
been measured at three locations inside and outside of these replicas (see figure 13). Three examples
corresponding to the 3 vowels are displayed in figure 14.
One does not notice important maxima at high frequency which could be the effect of the presence of transverse propagation modes as it was observed for the eccentric two-tubes replica. This
result is logical since the axisymmetric configuration chosen does not allow the first transverse propagation modes to be generated. This will be confirmed by measurements performed on an eccentric
replica of vowel /a/ which will be available soon for measurements.
5.
Reflection coefficient estimation
5.1
The two microphone method
The measurement of transfer functions between two points inside a tube gives the possibility to
compute an estimation of the reflection coefficient. This method is called the two-microphone method
[1] [2].
Considering that the pressure at each point is the sum of an incident wave and a reflected one it
can be expressed at points 1 and 2 by the following equations :
P1 = e−jkx1 + Rejkx1
(18)
P2 = e−jkx2 + Rejkx2
Where R is the reflection coefficient at the end of the tube. The transfer function between the
two points is H12 = P2 /P1 . Replacing P1 and P2 by their expression in (18) provides an expression
of R :
R=
e−jkx2 − H12 e−jkx1
H12 ejkx1 − ejkx2
(19)
This reflection coefficient can be compared to the theoretical one obtained from the radiation
impedance ZR with the following expression :
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
12
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
H
12
40
|H12| (dB)
20
0
−20
/a/
−40
/i/
/u/
−60
2000
3000
4000
5000
6000
f (Hz)
7000
8000
9000
10000
3000
4000
5000
6000
f (Hz)
7000
8000
9000
10000
φ(H12) (deg)
500
0
−500
−1000
2000
Figure 14: Modulus and phase of the transfer function between points 1 and 2 (located at 100 mm and
40 mm from the exit) of vowel /a/, /i/ and /u/ replicas (see figure 13).
ZR /Zc − 1
(20)
ZR /Zc + 1
A common way of representing the reflection coefficient is to plot its modulus and the length
correction corresponding to the phase shift induced by the reflection. This length correction δ is given
by the following relation :
R=
R = −|R|e−j2kδ
5.2
(21)
Reflection coefficient estimation from experimental data
The transfer functions measured in the one tube replica have been used to estimate its reflection
coefficient.
The theoretical reflection coefficient has been estimated using two different ways :
• using the theoretical expression (20)
• by computing the pressure at point 1, 2 and 3 with expression (1) taking viscothermal losses into
account. Then the same transfer functions have been estimated and the reflection coefficient is
deduced from equation (19)
Both the theoretical value and the experimental values have been plotted on figure 15. The
experimental values are close to the theoretical ones except for some values of ka (0.6, 1.2, 1.7 and
2.3). The theoretical values obtained taking viscothermal losses into account reduces these differences indicating that viscothermal losses could be a plausible explanation of the differences between
experiments and theory.
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
13
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
Reflection coefficient
2
Without VT losses
With VT losses
Experiments
|R|
1.5
1
0.5
0
0.5
1
1.5
2
2.5
3
ka
Length correction / radius
3
Without VT losses
With VT losses
Experiments
2.5
dl/d
2
1.5
1
0.5
0
0.5
1
1.5
2
2.5
3
ka
Figure 15: Modulus and ratio of length correction to radius of the reflection coefficient estimated
from the transfer function between points 1 and 3 of the one tube replica (see figure 8) plotted
against the ka product.
6.
Surface measurement
Measurements on surfaces at a given frequency have been performed to identify the kind of
modes involved in the propagation inside the replicas. The frequencies have been chosen as close as
possible to resonances and anti-resonances.
6.1
One tube replica
Measurements performed on the one tube replica show that at low frequency (up to about 5000
Hz) the acoustic field inside the replica (see figure 16) behaves as one-dimensional (plane waves)
except at the ends of the tube where evanescent non planar modes exist.
At higher frequency, non planar propagation modes can be observed (see figure 17). The mode
observed in figure 17 is not expected for a perfectly axisymmetric geometry. It exists experimentally
because the replica is not perfectly axisymmetric. This shows that when modelling a vocal tract with
a concatenation of cylinders sharing the same axis this kind of mode is not taken into account whereas
in real vocal tract nothing is axi-symmetrical and this kind of mode is supposed to exist.
6.2
Two tube replica
The same kind of measurement has been performed on the two tube replica.
As for the one tube replica one can see that at low frequency in both centric and eccentric cases
the plane wave theory describes well the internal wave field (see figure 18 and 20) except where
evanescent non planar modes are present.
At high frequency one can see the effect of eccentricity. At the same frequency (7400 Hz),
although non planar modes can be detected in the centric configuration, one can see that these higher
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
14
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
−35
−40
P (dB)
−45
−50
−55
−60
−65
0.06
0.04
0.25
0.02
0.2
0
0.15
−0.02
0.1
−0.04
0.05
−0.06
y (m)
0
x (m)
Figure 16: Amplitude of acoustic pressure measured inside and outside a one tube vocal tract replica
at 3340 Hz.
−50
P (dB)
−60
−70
−80
−90
−100
0.06
0.04
0.25
0.02
0.2
0
0.15
−0.02
0.1
−0.04
0.05
−0.06
y (m)
0
x (m)
Figure 17: Amplitude of acoustic pressure measured inside and outside of a one tube vocal tract
replica at 6810 Hz.
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
15
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
−10
P (dB)
−20
−30
−40
−50
−60
0.05
0.25
0.2
0
0.15
0.1
0.05
−0.05
y (m)
0
x (m)
Figure 18: Amplitude of acoustic pressure measured inside and outside of a two centric tubes replica
at 3060 Hz.
−10
−15
−20
P (dB)
−25
−30
−35
−40
−45
−50
0.05
0.25
0.2
0
0.15
0.1
0.05
−0.05
y (m)
0
x (m)
Figure 19: Amplitude of acoustic pressure measured inside and outside of a two centric tubes replica
at 7400 Hz.
order modes are predominent in the eccentric case. This result illustrates that in a concatenation vocal
tract geometry the eccentricity of each section has an influence.
This difference also confirms the assumptions made from transfer functions. At the zeros ob-
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
16
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
served at high frequency the wave field is dominated by the non planar modes in the eccentric case
whereas in the other case the plane waves remain predominant.
−20
−25
P (dB)
−30
−35
−40
−45
−50
−55
0.04
0.02
0.25
0
0.2
0.15
−0.02
0.1
−0.04
0.05
−0.06
y (m)
0
x (m)
Figure 20: Amplitude of acoustic pressure measured inside and outside of a two eccentric tubes
replica at 2550 Hz.
−40
−50
P (dB)
−60
−70
−80
−90
−100
0.04
0.02
0.25
0
0.2
0.15
−0.02
0.1
−0.04
0.05
−0.06
y (m)
0
x (m)
Figure 21: Amplitude of acoustic pressure measured inside and outside of a two eccentric tubes
replica at 7400 Hz.
The same pressure pattern are obtained with FEM simulations with however the difference that
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
17
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
for the centric case in high frequency no transverse mode can be seen since the mesh used is perfectly
simetric. For a comparison between experiment and FEM simulations the reader is referred to WP5
deliverable (D5.1 Simulation and Validation of VT sound with static geometries).
7.
Conclusion
The main challenge of this first year was to design, to build and to use a specific experimental
set-up in order to measure accurately and extensively the acoustics of vocal tract replicas. A step-bystep procedure, starting with simple academic geometries allowed thus to optimise the set-up as well
as the associated signal processing techniques. In particular, by comparing the measured data with
theoretical expectations, some spurious experimental artefacts have been suppressed or avoided. This
work achieved, reliable and meaningful data could have been shared with WP5 in order to validate
the numerical simulations as well as to investigate the possible origin of some departures.
This work also allowed us to illustrate some important features that are seldom mentioned in
the speech literature. The occurrence of higher acoustical modes is a spectacular example that affects
both the internal and the radiated sound field. A sensible study of this effect enhances further that the
greatest care must be taken with the three-dimensional geometrical description of the vocal tract.
The experimental set-up has been successfully extended to deformable vocal tract replicas. During the practical work of Boris Mondet (Universit´e Joseph Fourier), a single deformable tube was thus
constrained by two plates driven by a step motor in order to generate and control a dynamic constriction. This will allow us to simulate slight to large vocal tract movements (articulation) in particular in
view of the goals of year 2 and 3 of the present project.
Publications :
• R´emi Blandin, Xavier Pelorson, Annemie Van Hirtum, Rafa¨el laboissi`ere, Oriol Guasch and
Marc Arnela (2014) ”Effet des modes de propagation non plan dans les guides d’ondes a` section
variable”, accepted at the 12th French Congress on Acoustics, to appear in proceedings.
• Xavier Pelorson, Annemie Van Hirtum, Boris Mondet, Oriol Guasch and Marc Arnela (2013),
”Three-dimensional vocal tract acoustics”, Acoustics 2013, November 10-15, New Delhi, India.
• Boris Mondet (2013), ”Comportement acoustique du conduit vocal humain”, rapport de stage,
Juin-Juillet 2013, Universit´e Joseph Fourier, D´epartement Licence Sciences et Technologies.
REFERENCES
1
˚
M Abom
and H Bod´en. Error analysis of two-microphone measurements in ducts with flow. The
Journal of the Acoustical Society of America, 83:2429, 1988.
2
AF Seybert and DF Ross. Experimental determination of acoustic properties using a twomicrophone random-excitation technique. The Journal of the Acoustical Society of America,
61:1362, 1977.
3
BH Story. Comparison of magnetic resonance imaging-based vocal tract area functions obtained
from the same speaker in 1994 and 2002. The Journal of the Acoustical Society of America,
123:327, 2008.
4
A Van Hirtum and Y Fujiso. Insulation room for aero-acoustic experiments at moderate Reynolds
and low Mach numbers. Applied Acoustics, 73(1):72–77, 2012.
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
18
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
A.
Article : three dimensional vocal tract acoustics
EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission
19
THREE-DIMENSIONAL VOCAL TRACT ACOUSTICS
Xavier Pelorson, Annemie Van Hirtum, Boris Mondet
Gipsa-Lab, Département parole et Cognition, UMR CNRS UMR 5216 CNRS/INPG/UJF/Université
Stendhal, 11 rue des Mathématiques F-38420 Saint Martin d'Hères, France
Oriol Guasch, Marc Arnela
GTM Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, C/Quatre Camins
2, Barcelona 08022, Catalonia, Spain
e-mail: [email protected]
At present time, the theoretical models used in speech synthesis as well as in speech analysis
(such as inverse filtering, for instance) rely on low-frequency acoustic propagation models
(one dimensional approximation) through one-dimensional vocal tract approximations (using
an area function). The one dimensional approximation can be justified, to a certain extent, in
the case of voiced sounds due to the low-frequency behavior of the glottal source and due to
its position inside the vocal tract. This is not the case for plosives and fricatives for which one
can expect the generation and the propagation of higher acoustical modes. These higher
modes are then predominant not only inside a resonator but also have a spectacular effect on
the radiated sound in terms of directivity. Based on anatomical considerations, one can
estimate the first cut-on frequency of these higher acoustical modes to lie around 4-5 kHz,
which is in the middle of a typical speech spectrum and close to the maximum of sensitivity
of our ears. Perceptual effects of these higher acoustical modes can therefore be expected to
be considerable. A theoretical model based on a modal approach is then presented as an
alternative to plane-wave models. It is shown that, using this theoretical model, the solution
of the wave equation is analytical in the case of a simple geometry and can be extended
numerically to the case of more complex resonator shapes (closer to the human vocal tract)
by a matching mode procedure. Measurements of the acoustic pressure inside and radiated
from replicas of the vocal tract, using a sound probe driven by a micrometric 3-D stage
positioning system, will be presented and discussed. The experimental data will then be
compared with the theoretical predictions and with numerical simulations using the Finite
Element
Method.
Simple geometry, using concatenated tubes, will be first considered in order to illustrate
three-dimensional effects. Different vowels replicas, obtained from a 3-D printing of MRI
data, will be then considered.
ACOUSTIS2013NEWDELHI, New Delhi, India, November 10-15, 2013
1
1.
Introduction
Classical textbooks on physical models of speech production [1], [2] describe the propagation of
sound inside the vocal tract on the basis of a plane wave decomposition. As the same textbooks
clearly indicate that this description rely on a low frequency assumption, the limits of validity of the
underlying theory is not clearly established.
As our knowledge concerning the sound sources, the three dimensional vocal tract geometry
[3] is increasing in complexity and in accuracy, in-vivo measurements, or computer simulations
clearly enhance spectacular departures from plane wave theory even at moderate frequencies (of
order of 5 kHz) [4].
As a plausible explanation for these departures, we first present a theoretical investigation of
sound propagation inside a simplified vocal-tract like waveguide focusing in particular upon the
three dimensional effects due to the presence of higher acoustical modes.
Results obtained using numerical simulations and measurements on replicas of the vocal tract
will then be presented and discussed.
2.
Theoretical aspects
We first consider the case of a uniform waveguide. Let (O,x1,x2,x3) be any coordinate system, x3
being parallel to the waveguide axis. In the frequency domain, a general solution of the wave equation for the acoustic pressure, p might be sought in the form :

 x
 x 

p    ( x , x ) A e mn 3  B e mn 3 

mn 1 2  mn
mn
m, n  0



:=
(1)
  ( x , x )P
mn 1 2
mn
m, n  0
where Amn and Bmn are two constants depending on the end conditions. The (m,n) mode wave number, mn as well as the eigen functions, mn depend on the geometry of the waveguide, on the hygrometry and on the boundary conditions at the wall. When the system coordinate (x 1, x2) is separable, the eigenfunctions mn can be written in the form:
(2)
with fm and gn two orthogonal functions, Nmn a constant. km and kn are the associated eigenvalues to
the eigenfunctions mn.
The dispersion relationship provides :
(3)
Equation (3) shows that a given acoustical mode (m,n) will be propagating only if
. As an
example, writing k = 2f/c and km = 2fm/c one sees that the (m,0) mode will be propagating only if
the excitation frequency, f is higher that fm. fm is called the cut on frequency of the mode. Mode
(0,0) is always propagating because f0 = 0. This is the so-called plane wave.
It is worth mentioning that the decomposition used in (2) is only possible when the waveguide geometry is compatible with the coordinate system (x1, x2). In practice, this corresponds to rectangular
(compatible with cartesian coordinates) or elliptic shapes (compatible with prolate spheroidal coordinates). More complex shapes can however be assessed using approximate methods [5]. Lastly,
viscous and thermal losses can be accounted using boundary layer approximation.
A change of geometry may be described using a piecewise method as a succession of local discontinuities.
ACOUSTIS2013NEWDELHI, New Delhi, India, November 10-15, 2013
2
Si+1
Si
section i+1
x3
section i
Figure 1: Change of section between two waveguides i and i+1
Let p(i) and p(i+1), respectively be the components of the acoustical pressure in section (i) and in sec3
tion (i+1), respectively. Using modal decomposition one has :

(i )
(i )
p    (x , x )P
(4)
mn 1 2 mn
m, n  0

(i  1)
(i  1)
p
   (x , x )P
(5)
pq 1 2 pq
p, q  0
Where  (respectively  ) are the eigenfunctions associated with guide (i) (respectively
pq
mn
(i+1)). Applying the continuity of pressure at the junction between the two guides, Si and Si+1
gives :

(i)
(i  1) 1
(6)
P   P
 ( x , x )* ( x , x )dS
mn
pq
pq 1 2 mn 1 2
S
p, q  0
i
Si
In a similar way, continuity of the velocity along x3 provides a second relationship between the ve(i  1)
(i)
locity amplitudes V
and V
:
pq
mn

V
(i  1)

pq

(i) 1
 V
mn S i 1
m, n  0

* ( x , x ) ( x , x )dS
pq 1 2 mn 1 2
(7)
S i 1
For a vocal tract geometry discretized using N sections, equations (6) and (7) form thus a system of
N equations with N+2 unknowns. The specific boundary conditions at both ends of the vocal tract
(at section 1 and section N) provide the last two equations.
Equation (6) already points out an important geometrical effect. If two sections share the same axis
as in figure 1, because the first acoustical modes are antisymmetric, the resulting integral in (6) will
always equal zero. This effect is illustrated in the synthetic example of two connected tubes. the
first one is 85mm long and its diameter is 14.5 mm, while the second one is also 85 mm long with a
diameter of 30mm. Figure 2 presents the calculated transfer functions (output acoustic pressure /
glottal volume velocity) assuming, or not, that the tubes are centered.
Figure 2: Transfer function of a two tube junction. Left : centered, Right : eccentered.
ACOUSTIS2013NEWDELHI, New Delhi, India, November 10-15, 2013
3
x
3.
Experimental and Numerical Methods
3.1 Experimental set-up
The experimental set-up uses replicas of the vocal tract made of Plexiglas or ABS printed
using 3-D printers. The exit end of the replicas are mounted inside a rigid plane baffle while a
compression chamber provides the excitation, through a 1 mm diameter hole, at the entrance. A
sound pressure probe (Bruel and Kjaer 4182) can be displaced inside and outside the replica using a
stage positioning system (with an accuracy of 4 m). All measurements were performed in a soundinsulated room.
3.2 Numerical simulations
To carry out the numerical simulations, the Finite Element Method (FEM) has been used to
solve the acoustic wave equation in the time domain. In order to account for free-field propagation
and to consider a computational domain of a reasonable size as well, the latter has been surrounded
with a Perfectly Matched Layer (PML), which avoids any spurious reflection at the domain
boundaries. The PML formulation developed in [6] has been adapted to the FEM framework and
the resulting modified wave equation has been solved using an explicit time evolving scheme (see
[7] for details of the implemented formulation).
Each simulated duct system or vowel exits at a rigid baffle with dimensions 0.25 m x 0.25 m.
The baffle constitutes one surface of a rectangular volume of 0.25 m x 0.25 m x 0.1 m in size,
which allows sound waves emanating from the tube system propagate towards infinity. As said, this
volume is surrounded by a 0.1 m width PML with a relative reflection coefficient of 10 -4. With
regard to the boundary conditions, a constant frequency boundary admittance µ=0.0005 has been
assigned at the duct walls to get some losses, and a sinusoid having the same frequency to that in
the corresponding experimental test has been imposed at the duct entrance. The resulting
computational domains have been meshed following the ten nodes per wavelength accuracy criteria
[8]. Proper time step values have been chosen for each FEM mesh to fulfil a stability condition of
the Courant-Friedrich-Levy type. The speed of sound has been computed using the temperature at
which the experiments were performed. A numerical simulation lasting 25ms has been carried out
for each analyzed case, capturing the acoustic pressure within the tube and in the near-field in a
prefixed grid with a spatial resolution of 0.002m, to allow comparisons with experiments. The mean
pressure at each grid point has been computed from the last 5ms of the numerical simulation.
4.
Results
4.1 Two-tubes
We present the FEM simulation of the two connected tubes configuration considered in
section 2. The first one is 85 mm long and its diameter is 14.5 mm, while the second one is also 85
mm long but its diameter is 30 mm. Two configurations are considered, one for which the two tubes
are centered and another for which they are eccentered. The simulated transfer functions for both
configurations are presented in figure 3.
ACOUSTIS2013NEWDELHI, New Delhi, India, November 10-15, 2013
4
Figure 3 : FEM simulations of the transfer function of two connected tubes. Left curve :
centered tubes, right curve : eccentered tubes.
This result confirms the theoretical expectation presented in figure 2. The influence of the
relative position of the two tubes can be clearly seen and analyzed as an effect of higher acoustical
modes.
4.2 Vowels
3-D geometry of several vowels [3] was used for both FEM simulations and acoustical
measurements. Figure 4 presents an example of comparison between FEM simulations and
measured data in the case of vowel /i/.
Figure 4: Left: FEM simulation of the acoustical pressure inside and outside a /i/ vocal tract,
Right : Comparison between the simulation and the measured data on the center line. Results for an
excitation at 4500 Hz.
Figure 5: Comparison between FEM simulation and theoretical transfer function for vowel
/a/
ACOUSTIS2013NEWDELHI, New Delhi, India, November 10-15, 2013
5
As a last example, we present, on figure 5, a comparison between FEM simulations and
theoretical prediction for the transfer function of vowel /a/.
FEM simulations agree very well with both theoretical expectations and measured data. Some
discrepancies can however be observed at high frequency which can probably be attributed to the
different radiation models
5.
Conclusions
This paper describes an extension of the plane wave theory to the case of 3-D vocal tract
geometry. The theoretical model has been successfully compared with both FEM simulations and
experimental data obtained on casts of vocal tracts. The 3-D effects appeared to be significant in the
high frequency domain and depend strongly on the geometrical discretization. The apparition of
higher acoustical modes leaded to zeros in the transfer function (at the cut-on frequency of these
modes) and to extra resonances. Further, contrarily to plane waves, these higher modes generate a
highly directive sound pressure field.
Because these effects occur in the higher frequency range, their relevance for vowels might be
little since the glottal sound source is of low frequency nature. However, in the case of plosives or
fricatives we probably can expect them to be significant. Additional experiments and simulations
will be performed to confirm this conclusion.
6.
Acknowledgements
This research has been supported by EU-FET grant EUNISON 308874.
REFERENCES
[1] Fant G. (1960) Acoustic Theory of Speech Production. Mouton, The Hague.
[2] Flanagan J.L.(1972) Speech Analysis Synthesis and Perception, 2nd Edition, SpringerVerlag, Berlin.
[3] Story B.H. 2008 Comparison of magnetic resonance imaging-based vocal tract area
functions obtained from the same speaker in 1994 and 2002. J Acoust Soc Am., 123,327-35.
[4] Elmasri S., Pelorson X., Saguet P., Badin P. (1998). The use of the Transmission Line
Matrix in acoustics and in Speech. International Journal of Numerical Modeling, 11, 133151.
[5] Laboissière R., Yehia H.C., Pelorson X. Higher order modes propagation in the human
vocal tract, Proceedings of Acoustics 2012, Nantes, France.
[6] Grote M. and Sim I., 2010. Efficient PML for the wave equation, Global Science Preprint,
arXiv: math.NA/1001.0319v1.
[7] Arnela M. and Guasch O., 2013. Finite element computation of elliptical vocal tract
impedances using the two-microphone transfer function method, Journal of the Acoustical
Society of America, 133 (6), 4197–4209.
[8] Ihlenburg F., 1998. Finite Element Analysis of Acoustic Scattering, Applied Mathematical
Sciences, Springer, Berlin, Chap. 2.
ACOUSTIS2013NEWDELHI, New Delhi, India, November 10-15, 2013
6