D6.1 VOCAL TRACT REPLICAS AND ACOUSTIC MEASUREMENTS. Xavier Pelorson and R´emi Blandin and Annemie Van Hirtum and Xavier Laval Gipsa-lab, 11 rue des Math´ematiques, Grenoble Campus, Saint Martin d’H`eres, France e-mail: [email protected],e-mail: [email protected], e-mail: [email protected], email : [email protected] The goal of this work is to provide accurate and extensive acoustic measurements on various vocal tract replicas in order to validate the numerical simulations performed in WP5. This task required first, the development of a specific measurement set-up to acquire the acoustic pressure inside vocal tract replicas at specific positions or over a whole surface using a 3D stage positioning system. Then, the optimization of each element of this set-up as well as the post processing of the acquired data was the second major challenge. In close collaboration with WP5, measurements have been performed for vocal tract replicas of increasing complexity. The comparisons with the numerical simulations performed in WP5 was complemented with comparisons using theoretical predictions, obtained from simple acoustical theory. Contents 1 Introduction 3 2 Experimental setup 2.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Vocal tract replicas and acoustic excitation . . . . . . . . . . . . . . . . . . . . . . . 2.3 Acoustic pressure modulus and phase estimation . . . . . . . . . . . . . . . . . . . 3 3 3 5 3 Theory 3.1 Plane wave theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 One tube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Two tubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 6 6 4 Transfer function measurement 4.1 Transfer function measurement method . . . . . . . . . . . . . . . . . . . . . 4.2 Problems encountered when trying to perform measurements near the source 4.3 Measured transfer functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 One tube replica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Two tubes replica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Vowels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 8 9 9 10 12 5 Reflection coefficient estimation 5.1 The two microphone method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Reflection coefficient estimation from experimental data . . . . . . . . . . . . . . . 12 12 13 6 Surface measurement 6.1 One tube replica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Two tube replica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 14 14 7 Conclusion 18 REFERENCES 18 A Article : three dimensional vocal tract acoustics 19 EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission 2 EUNISON: Extensive UNIfied-domain SimulatiON of the human voice 1. Introduction A very commonly used geometrical approximation of the vocal tract consists in a succession of tubes having different cross sections and sharing the same axis.This approximation implicitely assumes plane wave propagation. However, the human vocal tract is not perfectly axisymmetric and transverse modes could be generated and involved in the propagation of sound in the frequency range of interest for speech production. An approximation consisting of a succession of tubes, taking the eccentricity of vocal tract and the transverse propagation modes into account may be a better approximation. To investigate to what extent the plane wave propagation is accurate, measurements of transfer functions and pressure patterns at a given frequency inside the vocal tract replicas are performed. Several replicas of increasing complexity are studied. As a particularly illustrative example, two replicas constituted of a succession of two tubes, one with the two tubes sharing the same axis and the other with different axis, are compared. These measurements are compared to the plane wave acoustic theory and with Finite Element (FEM) simulations. The experimental setup used to perform the measurements is first presented. Then, the simple acoustic theory for a one tube and a two tubes geometry is introduced. Afterwards, the transfer function estimation method is explained and the experimental results are presented and discussed. An estimation of the reflection coefficient for a one tube replica is described and the experimental data is compared with theory. Eventually, the presentation of the surface measurements allows to confirm the observations and the assumptions made from the transfer function measurements. 2. Experimental setup 2.1 Setup To measure the acoustic pressure inside a vocal tract replica, an experimental setup is used. It is composed of an acoustic source , a probe microphone (B&K type 4182 with a 200 mm long and 1 mm wide probe) moved by a 3-axis positioning system (OWIS PS35), an anechoic room [4] (1.92x1.95x1.99 m, Vol = 7.45 m3 ) (see figure 2a). A BNC board connects the electrical signals to a computer containing a data acquisition card (PCI-MIO 16XE)(see figure 1). Data acquisition is controlled using Labview. The positioner is used to measure the pressure in various locations inside and outside of the vocal tract replica. The source allows to generate sinusoidal signals at given frequencies. The setup is placed in the anechoic room and acoustic foam is placed under the screen to avoid reflection effects. Temperature is measured for each experiment with a thermometer placed in the anechoic room. For a given position of the probe or frequency of the source, the pressure and the source input voltage signals are recorded during about 1 s. This allows to compute the modulus and phase of the pressure. 2.2 Vocal tract replicas and acoustic excitation Six different vocal tract replicas, of increasing geometrical complexity, have been constructed for this study : • a simple uniform straight tube of dimensions 29.5 × 170mm (see figure 3a). • a two-tube cascade with dimensions 14 × 85mm for the first tube and 29.5 × 85mm for the second one. Two geometrical configurations have been considered. In the first one, the centered configuration, both tubes share the same axis of revolution while in the second case, the eccentered configuration, the axis of revolution is different (see figure 3b and 3c). Figure 2: Experimental setup. • three 3D printings of vocal tract geometries corresponding to the vowel /a/, /i/ and /u/ have been realised in rigid acrylic. These geometries have been taken from the litterature [3] (see figure 4). The acoustic excitation of the replicas was realised using compression chambers. In order to cover the full frequency range of interest for speech (i.e. up to 10 kHz), two compression chambers were used: a Monacor KU-916T for the lowest frequency range (50 Hz - 2kHz) and an Eminence PSD:2002S-8 for the highest frequencies. In order to prevent for acoustic interferences, the sound source was located outside of the anechoic room. The connection between the sound source and the vocal tract replicas was performed using and adaptation part (see figure 2b). The acoustic excitation of the replicas was therefore radiated through a hole of 1mm diameter. The presence of noise cannot be excluded and harmonic distortion of the acoustic source is not avoidable. On the second hand, the signals can have a continuous component and transient phenomena can be present when the frequency changes and when the source start to generate sound. To avoid all of these artefacts, a careful signal processing is performed. The first 200 ms are removed to avoid transient phenomena then the Fourier transform of the signal is computed. The spectrum amplitude is normalised by multiplying it by 2/N (N being the number of samples of the analysed signal). The Fourier transform is computed using zero-padding to get a frequency resolution lower than 0.1 Hz. The maximum of the spectrum is searched on a frequency band centered on the supposed signal frequency. The frequency and the phase corresponding to this maximum are then extracted. A parabolic interpolation is performed on the 3 closest points of maximum to get a better estimation of the amplitude (the point having the maximal amplitude is not necessarily the maximum of the Fourier transform). 3. Theory 3.1 Plane wave theory 3.1.1 One tube The wave wave field produced by a source located in a single uniform tube can be described at any abscissa x by an acoustic pressure and a flow of the following form (a time factor e−jωt is understood throughout this part) : P = A(e−jkx + Rejkx ) (1) U = Zc−1 A(e−jkx − Rejkx ) Where A is an amplitude factor, R is a reflection coefficient, k = 2πf /c is the wave number (f being the frequency and c the sound speed) and Zc = ρc/S is the characteristic impedance (ρ being the air density and S the tube cross section area). At the exit (abscissa x = 0) of the tube the radiation impedance Zr gives the following boundary condition : P (0) 1+R = Zr = −1 U(0) Zc (1 − R) (2) This equation gives the expression of the reflection coefficient R : R= Zr /Zc − 1 Zr /Zc + 1 (3) At the the source (x = xs ) , the following condition is satisfied : U(xs ) = Us (4) Where Us is the amplitude of the acoustic source. This leads to the expression of A : Us Zc − Rejkxs The viscothermal losses can be taken into account using a complex wavenumber. A= 3.1.2 e−jkxs (5) Two tubes The wave field produced by a source located in a segmented two tube waveguide can be described at any abscissa x by the following equations : EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission 6 EUNISON: Extensive UNIfied-domain SimulatiON of the human voice P1 = A1 (e−jkx + R1 ejkx ) −1 U1 = Zc1 A1 (e−jkx − R1 ejkx ) P2 = A2 (e−jkx + R2 ejkx ) −1 U2 = Zc2 A2 (e−jkx − R2 ejkx ) (6) Where indice 1 refer to tube 1 and indice 2 to tube 2 (see figure 5). The continuity of pressure and the conservation of the acoustic flow on the junction (at the abscissa x = 0) give two equations : A1 [1 + R1 ] = A2 [1 + R2 ] S1 A1 [1 − R1 ] = S2 A2 [1 − R2 ] (7a) (7b) Where S1 and S2 are the cross section surfaces of tube 1 and 2. Adding 7a to 7b and dividing by S1 gives a relationship between A1 and A2 : 1 S2 A1 = A2 1 + R2 + (1 − R2 ) = CA2 (8) 2 S1 Dividing 7a by 7b leads to a relationship between R1 and R2 (which is equivalent to writing the equality of the impedances on both sides of the junction) : R1 = S1 (1 + R2 ) − S2 (1 − R2 ) S1 (1 + R2 ) + S2 (1 − R2 ) (9) The reflection coefficient R2 can be found thanks to the boundary condition at the exit : P2 (l2 ) = Zr U2 (l2 ) (10) Thus : R2 = e−2jkl2 Zr /Zc2 − 1 Zr /Zc2 + 1 (11) At the source, assuming that the source is located inside the tube 1, the following condition is satisfied : −1 U1 (xs ) = Zc1 A1 (e−jkxs − R1 ejkxs ) = Us (12) Where Us is the amplitude of the source flow. This leads to the following expression for the amplitude factor A1 : A1 = Us Zc1 − R1 ejkxs e−jkxs (13) If the source is located inside the tube 2 the following relationship is verified : A2 = Us Zc2 − R2 ejkxs e−jkxs The pressure sa is firstly measured for each frequency at the first point of coordinate a then it is measured at the second point of coordinate b (see figure 6). During each measurement the supply voltage s0 is measured at the same time in order to have a phase reference. So the acquired signals are : s0 = A0 ejφ0 and sa = Aa ejφa (15) s0 = A0 ejφ0 and sb = Ab ejφb Both transfer functions H0a and H0b between the supply voltage and the pressure at the measurement points are then estimated. To achieve this the amplitude of the signal measured by the microphone is divided by the supply voltage amplitude to compute the modulus. The phase is obtained by computing the phase shift between the signal measured by the microphone and the supply voltage. So the transfer functions H0a and H0b are : H0a = H0b = Aa j(φa −φ0 ) e A0 Ab j(φb −φ0 ) e A0 (16) The transfer function Hab between the measurement points a and b is obtained as the ratio H0b /H0a . The transfer function H0a corresponds to the product of the transfer functions of the acoustic source, the propagation of sound from the source to the point xa , the probe, the microphone and the microphone conditioner. If the experimental conditions are exactly the same for the measurement of transfer function H0a and H0b , the transfer function H0b is the product of transfer function H0a by the transfer function Hab which ones wants to know. Thus we have : H0b (17) H0a The whole measurement system transfer function is thus eliminated with this computation. This method though a bit heavy gives quality results because enough energy is supplied for each frequency to make a good measurement. Hab = 4.2 Problems encountered when trying to perform measurements near the source Transfer function estimation has been firstly performed between the entrance and the exit of the one tube theory. The frequency range has been chosen below the cutoff frequency of the first transverse mode (about 7000 Hz) so the planar mode was expected to be predominant. However the transfer function obtained were not in agreement with the plane wave theory. These differences are due to the fact that evanescent non planar modes are not negligible at the tube extremities. A pressure measure on a surface (20mm × 10mm) perpendicular to the tube axis just in front of the communication hole shows that the plane wave assumption is not valid at this place (see figure 7). One can see that close to the source the pressure perturbation can be important over a short distance. Even if the theory takes evanescent non planar modes into account, errors due to probe location uncertainty remains critical. So the neighborhood of the communication hole has been avoided to perform transfer function estimation. 4.3 4.3.1 Measured transfer functions One tube replica Three transfer functions have been measured with the previously described setup and method. A 160 wat Sphynx SP-DYN-PRO2 acoustic source and a type 4182 B&K microphone with a 200 mm long and 1 mm wide probe have been used. The duct used was 170 mm long and had an internal diameter of 30 mm. It ended in a 300 mm wide and 400 mm long screen. The pressure has been measured at 3 points labeled 1,2 ans 3 respectively located at 120 mm, 80 mm and 40 mm from the duct entrance (see figure 8) at frequencies varying from 2 kHz to 10 kHz by steps of 50 Hz. The modulus and phase of the transfer function measured between points 1 and 3 is presented in figure 9. The theoretical transfer functions have been computed with a plane wave theory assuming that the duct ends in an infinite screen. Viscothermal losses are taken into account. The experimental results show a good agreement with theory. Figure 9: Modulus and phase of the transfer function between two points located at 120 mm and 40 mm from the entrance of a duct which is 170 mm long and has an internal diameter of 30 mm. The dots have been obtained by measurement and the line is computed from the theory. 4.3.2 Two tubes replica Six transfer functions have been measured on both centric and eccentric two tube replicas. The measurements have been performed in two stages because the source used for generating high frequencies (above 2000 Hz) cannot be used for low frequencies. So the transfer functions have first been measured between 2000 Hz and 10000 Hz with a high frequency source (Eminence PSD:2002S8) and then between 100 Hz and 2000 Hz with a low frequency source (Monacor KU-916T). The pressure has been measured at 4 points labeled 1,2,3 and 4 (see figure 10 and 11). The transfer functions between these points have then been computed. As an example the one between points 1 and 2 is presented in figure 12. As one can see, the transfer functions are quite similar at low frequency (up to about 5000 Hz). This is no more the case at high frequency. The most noticeable difference is the presence of maxima (as an example at 7220 Hz and 7910 Hz) and minima (as an example at 7060 Hz and 7500 Hz) above 7000 Hz for the eccentric case which does not appear in the other case. This difference is due to the fact that in the eccentric configuration the non planar propagation modes are excited whereas they are almost non existent in the other configuration. When the frequency is higher than the non planar cutoff frequency they can propagate and generate other resonances than the plane wave resonances. This is the reason of the presence of additional maxima in the transfer function. The results obtained from FEM simulation are in agreement with these measurements and confirms this difference of behaviour between both configurations. For a comparison between experiment and FEM simulations the reader is referred to WP5 deliverable (D5.1 Simulation and Validation of VT sound with static geometries). 4.3.3 Vowels Transfer function measurements have been performed on 3D printed vocal tract replicas. These replicas are a concatenation of cylinders corresponding respectively to vowels /a/, /i/ and /u/ (the area functions have been taken from [3]). All the cylinders share the same central axis. Three transfer functions between these points have then been computed in the same way as the one used for the two tube replicas. The acoustic sources used are also the same. The pressure has been measured at three locations inside and outside of these replicas (see figure 13). Three examples corresponding to the 3 vowels are displayed in figure 14. One does not notice important maxima at high frequency which could be the effect of the presence of transverse propagation modes as it was observed for the eccentric two-tubes replica. For a comparison between experiment and FEM simulations the reader is referred to WP5 deliverable (D5.1 Simulation and Validation of VT sound with static geometries). 4.3.3 Vowels Transfer function measurements have been performed on 3D printed vocal tract replicas. These replicas are a concatenation of cylinders corresponding respectively to vowels /a/, /i/ and /u/ (the area functions have been taken from [3]). All the cylinders share the same central axis. Three transfer functions between these points have then been computed in the same way as the one used for the two tube replicas. The acoustic sources used are also the same. The pressure has been measured at three locations inside and outside of these replicas (see figure 13). Three examples corresponding to the 3 vowels are displayed in figure 14. One does not notice important maxima at high frequency which could be the effect of the presence of transverse propagation modes as it was observed for the eccentric two-tubes replica. This result is logical since the axisymmetric configuration chosen does not allow the first transverse propagation modes to be generated. This will be confirmed by measurements performed on an eccentric replica of vowel /a/ which will be available soon for measurements. 5. Reflection coefficient estimation 5.1 The two microphone method The measurement of transfer functions between two points inside a tube gives the possibility to compute an estimation of the reflection coefficient. This method is called the two-microphone method [1] [2]. Considering that the pressure at each point is the sum of an incident wave and a reflected one it can be expressed at points 1 and 2 by the following equations : P1 = e−jkx1 + Rejkx1 (18) P2 = e−jkx2 + Rejkx2 Where R is the reflection coefficient at the end of the tube. The transfer function between the two points is H12 = P2 /P1 . Replacing P1 and P2 by their expression in (18) provides an expression of R : R= e−jkx2 − H12 e−jkx1 H12 ejkx1 − ejkx2 (19) This reflection coefficient can be compared to the theoretical one obtained from the radiation impedance ZR with the following expression : ZR /Zc − 1 (20) ZR /Zc + 1 A common way of representing the reflection coefficient is to plot its modulus and the length correction corresponding to the phase shift induced by the reflection. This length correction δ is given by the following relation : R= R = −|R|e−j2kδ 5.2 (21) Reflection coefficient estimation from experimental data The transfer functions measured in the one tube replica have been used to estimate its reflection coefficient. The theoretical reflection coefficient has been estimated using two different ways : • using the theoretical expression (20) • by computing the pressure at point 1, 2 and 3 with expression (1) taking viscothermal losses into account. Then the same transfer functions have been estimated and the reflection coefficient is deduced from equation (19) Both the theoretical value and the experimental values have been plotted on figure 15. The experimental values are close to the theoretical ones except for some values of ka (0.6, 1.2, 1.7 and 2.3). The theoretical values obtained taking viscothermal losses into account reduces these differences indicating that viscothermal losses could be a plausible explanation of the differences between experiments and theory. Surface measurement Measurements on surfaces at a given frequency have been performed to identify the kind of modes involved in the propagation inside the replicas. The frequencies have been chosen as close as possible to resonances and anti-resonances. 6.1 One tube replica Measurements performed on the one tube replica show that at low frequency (up to about 5000 Hz) the acoustic field inside the replica (see figure 16) behaves as one-dimensional (plane waves) except at the ends of the tube where evanescent non planar modes exist. At higher frequency, non planar propagation modes can be observed (see figure 17). The mode observed in figure 17 is not expected for a perfectly axisymmetric geometry. It exists experimentally because the replica is not perfectly axisymmetric. This shows that when modelling a vocal tract with a concatenation of cylinders sharing the same axis this kind of mode is not taken into account whereas in real vocal tract nothing is axi-symmetrical and this kind of mode is supposed to exist. 6.2 Two tube replica The same kind of measurement has been performed on the two tube replica. As for the one tube replica one can see that at low frequency in both centric and eccentric cases the plane wave theory describes well the internal wave field (see figure 18 and 20) except where evanescent non planar modes are present. At high frequency one can see the effect of eccentricity. At the same frequency (7400 Hz), although non planar modes can be detected in the centric configuration, one can see that these higher order modes are predominent in the eccentric case. This result illustrates that in a concatenation vocal tract geometry the eccentricity of each section has an influence. This difference also confirms the assumptions made from transfer functions. EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission 15 EUNISON: Extensive UNIfied-domain SimulatiON of the human voice −10 P (dB) −20 −30 −40 −50 −60 0.05 0.25 0.2 0 0.15 0.1 0.05 −0.05 y (m) 0 x (m) Figure 18: Amplitude of acoustic pressure measured inside and outside of a two centric tubes replica at 3060 Hz. −10 −15 −20 P (dB) −25 −30 −35 −40 −45 −50 0.05 0.25 0.2 0 0.15 0.1 0.05 −0.05 y (m) 0 x (m) Figure 19: Amplitude of acoustic pressure measured inside and outside of a two centric tubes replica at 7400 Hz. order modes are predominent in the eccentric case. This result illustrates that in a concatenation vocal tract geometry the eccentricity of each section has an influence. This difference also confirms the assumptions made from transfer functions. At the zeros observed at high frequency the wave field is dominated by the non planar modes in the eccentric case whereas in the other case the plane waves remain predominant. The same pressure pattern are obtained with FEM simulations with however the difference that for the centric case in high frequency no transverse mode can be seen since the mesh used is perfectly simetric. For a comparison between experiment and FEM simulations the reader is referred to WP5 deliverable (D5.1 Simulation and Validation of VT sound with static geometries). 7. Conclusion The main challenge of this first year was to design, to build and to use a specific experimental set-up in order to measure accurately and extensively the acoustics of vocal tract replicas. A step-bystep procedure, starting with simple academic geometries allowed thus to optimise the set-up as well as the associated signal processing techniques. In particular, by comparing the measured data with theoretical expectations, some spurious experimental artefacts have been suppressed or avoided. This work achieved, reliable and meaningful data could have been shared with WP5 in order to validate the numerical simulations as well as to investigate the possible origin of some departures. This work also allowed us to illustrate some important features that are seldom mentioned in the speech literature. The occurrence of higher acoustical modes is a spectacular example that affects both the internal and the radiated sound field. A sensible study of this effect enhances further that the greatest care must be taken with the three-dimensional geometrical description of the vocal tract. The experimental set-up has been successfully extended to deformable vocal tract replicas. During the practical work of Boris Mondet (Universit´e Joseph Fourier), a single deformable tube was thus constrained by two plates driven by a step motor in order to generate and control a dynamic constriction. This will allow us to simulate slight to large vocal tract movements (articulation) in particular in view of the goals of year 2 and 3 of the present project. Publications : • R´emi Blandin, Xavier Pelorson, Annemie Van Hirtum, Rafa¨el laboissi`ere, Oriol Guasch and Marc Arnela (2014) "Effet des modes de propagation non plan dans les guides d'ondes a` section variable", accepted at the 12th French Congress on Acoustics, to appear in proceedings. • Xavier Pelorson, Annemie Van Hirtum, Boris Mondet, Oriol Guasch and Marc Arnela (2013), "Three-dimensional vocal tract acoustics", Acoustics 2013, November 10-15, New Delhi, India. • Boris Mondet (2013), "Comportement acoustique du conduit vocal humain", rapport de stage, Juin-Juillet 2013, Universit´e Joseph Fourier, D´epartement Licence Sciences et Technologies. REFERENCES 1 ˚ M Abom and H Bod´en. Error analysis of two-microphone measurements in ducts with flow. The Journal of the Acoustical Society of America, 83:2429, 1988. 2 AF Seybert and DF Ross. Experimental determination of acoustic properties using a twomicrophone random-excitation technique. The Journal of the Acoustical Society of America, 61:1362, 1977. 3 BH Story. Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. The Journal of the Acoustical Society of America, 123:327, 2008. 4 A Van Hirtum and Y Fujiso. Insulation room for aero-acoustic experiments at moderate Reynolds and low Mach numbers. Applied Acoustics, 73(1):72–77, 2012. Article : three dimensional vocal tract acoustics EUNISON is supported by the Future and Emerging Technologies (FET) programme within the ICT theme of the Seventh Framework Programme for Research of the European Commission 19 THREE-DIMENSIONAL VOCAL TRACT ACOUSTICS Xavier Pelorson, Annemie Van Hirtum, Boris Mondet Gipsa-Lab, Département parole et Cognition, UMR CNRS UMR 5216 CNRS/INPG/UJF/Université Stendhal, 11 rue des Mathématiques F-38420 Saint Martin d'Hères, France Oriol Guasch, Marc Arnela GTM Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, C/Quatre Camins 2, Barcelona 08022, Catalonia, Spain e-mail: [email protected] At present time, the theoretical models used in speech synthesis as well as in speech analysis (such as inverse filtering, for instance) rely on low-frequency acoustic propagation models (one dimensional approximation) through one-dimensional vocal tract approximations (using an area function). The one dimensional approximation can be justified, to a certain extent, in the case of voiced sounds due to the low-frequency behavior of the glottal source and due to its position inside the vocal tract. This is not the case for plosives and fricatives for which one can expect the generation and the propagation of higher acoustical modes. These higher modes are then predominant not only inside a resonator but also have a spectacular effect on the radiated sound in terms of directivity. Based on anatomical considerations, one can estimate the first cut-on frequency of these higher acoustical modes to lie around 4-5 kHz, which is in the middle of a typical speech spectrum and close to the maximum of sensitivity of our ears. Perceptual effects of these higher acoustical modes can therefore be expected to be considerable. A theoretical model based on a modal approach is then presented as an alternative to plane-wave models. The one dimensional approximation can be justified, to a certain extent, in the case of voiced sounds due to the low-frequency behavior of the glottal source and due to its position inside the vocal tract. This is not the case for plosives and fricatives for which one can expect the generation and the propagation of higher acoustical modes. These higher modes are then predominant not only inside a resonator but also have a spectacular effect on the radiated sound in terms of directivity. Based on anatomical considerations, one can estimate the first cut-on frequency of these higher acoustical modes to lie around 4-5 kHz, which is in the middle of a typical speech spectrum and close to the maximum of sensitivity of our ears. Perceptual effects of these higher acoustical modes can therefore be expected to be considerable. A theoretical model based on a modal approach is then presented as an alternative to plane-wave models. Introduction Classical textbooks on physical models of speech production [1], [2] describe the propagation of sound inside the vocal tract on the basis of a plane wave decomposition. As the same textbooks clearly indicate that this description rely on a low frequency assumption, the limits of validity of the underlying theory is not clearly established. As our knowledge concerning the sound sources, the three dimensional vocal tract geometry [3] is increasing in complexity and in accuracy, in-vivo measurements, or computer simulations clearly enhance spectacular departures from plane wave theory even at moderate frequencies (of order of 5 kHz) [4]. As a plausible explanation for these departures, we first present a theoretical investigation of sound propagation inside a simplified vocal-tract like waveguide focusing in particular upon the three dimensional effects due to the presence of higher acoustical modes. Results obtained using numerical simulations and measurements on replicas of the vocal tract will then be presented and discussed. 2. Theoretical aspects We first consider the case of a uniform waveguide. Let (O,x1,x2,x3) be any coordinate system, x3 being parallel to the waveguide axis. In the frequency domain, a general solution of the wave equation for the acoustic pressure, p might be sought in the form : x x p ( x , x ) A e mn 3 B e mn 3 mn 1 2 mn mn m, n 0 := (1) ( x , x )P mn 1 2 mn m, n 0 where Amn and Bmn are two constants depending on the end conditions. The (m,n) mode wave number, mn as well as the eigen functions, mn depend on the geometry of the waveguide, on the hygrometry and on the boundary conditions at the wall. When the system coordinate (x 1, x2) is separable, the eigenfunctions mn can be written in the form: (2) with fm and gn two orthogonal functions, Nmn a constant. km and kn are the associated eigenvalues to the eigenfunctions mn. The (m,n) mode wave number, mn as well as the eigen functions, mn depend on the geometry of the waveguide, on the hygrometry and on the boundary conditions at the wall. When the system coordinate (x 1, x2) is separable, the eigenfunctions mn can be written in the form: (2) with fm and gn two orthogonal functions, Nmn a constant. km and kn are the associated eigenvalues to the eigenfunctions mn. ACOUSTIS2013NEWDELHI, New Delhi, India, November 10-15, 2013 2 Si+1 Si section i+1 x3 section i Figure 1: Change of section between two waveguides i and i+1 Let p(i) and p(i+1), respectively be the components of the acoustical pressure in section (i) and in sec3 tion (i+1), respectively. Using modal decomposition one has : (i ) (i ) p (x , x )P (4) mn 1 2 mn m, n 0 (i 1) (i 1) p (x , x )P (5) pq 1 2 pq p, q 0 Where (respectively ) are the eigenfunctions associated with guide (i) (respectively pq mn (i+1)). Applying the continuity of pressure at the junction between the two guides, Si and Si+1 gives : (i) (i 1) 1 (6) P P ( x , x )* ( x , x )dS mn pq pq 1 2 mn 1 2 S p, q 0 i Si In a similar way, continuity of the velocity along x3 provides a second relationship between the ve(i 1) (i) locity amplitudes V and V : pq mn V (i 1) pq (i) 1 V mn S i 1 m, n 0 * ( x , x ) ( x , x )dS pq 1 2 mn 1 2 (7) S i 1 For a vocal tract geometry discretized using N sections, equations (6) and (7) form thus a system of N equations with N+2 unknowns. The specific boundary conditions at both ends of the vocal tract (at section 1 and section N) provide the last two equations. Equation (6) already points out an important geometrical effect. If two sections share the same axis as in figure 1, because the first acoustical modes are antisymmetric, the resulting integral in (6) will always equal zero. All measurements were performed in a soundinsulated room. 3.2 Numerical simulations To carry out the numerical simulations, the Finite Element Method (FEM) has been used to solve the acoustic wave equation in the time domain. In order to account for free-field propagation and to consider a computational domain of a reasonable size as well, the latter has been surrounded with a Perfectly Matched Layer (PML), which avoids any spurious reflection at the domain boundaries. The PML formulation developed in [6] has been adapted to the FEM framework and the resulting modified wave equation has been solved using an explicit time evolving scheme (see [7] for details of the implemented formulation). Each simulated duct system or vowel exits at a rigid baffle with dimensions 0.25 m x 0.25 m. The baffle constitutes one surface of a rectangular volume of 0.25 m x 0.25 m x 0.1 m in size, which allows sound waves emanating from the tube system propagate towards infinity. All measurements were performed in a soundinsulated room. 3.2 Numerical simulations To carry out the numerical simulations, the Finite Element Method (FEM) has been used to solve the acoustic wave equation in the time domain. In order to account for free-field propagation and to consider a computational domain of a reasonable size as well, the latter has been surrounded with a Perfectly Matched Layer (PML), which avoids any spurious reflection at the domain boundaries. The PML formulation developed in [6] has been adapted to the FEM framework and the resulting modified wave equation has been solved using an explicit time evolving scheme (see [7] for details of the implemented formulation). Each simulated duct system or vowel exits at a rigid baffle with dimensions 0.25 m x 0.25 m. The baffle constitutes one surface of a rectangular volume of 0.25 m x 0.25 m x 0.1 m in size, which allows sound waves emanating from the tube system propagate towards infinity. As said, this volume is surrounded by a 0.1 m width PML with a relative reflection coefficient of 10 -4. With regard to the boundary conditions, a constant frequency boundary admittance µ=0.0005 has been assigned at the duct walls to get some losses, and a sinusoid having the same frequency to that in the corresponding experimental test has been imposed at the duct entrance. The resulting computational domains have been meshed following the ten nodes per wavelength accuracy criteria [8]. Proper time step values have been chosen for each FEM mesh to fulfil a stability condition of the Courant-Friedrich-Levy type. The speed of sound has been computed using the temperature at which the experiments were performed. A numerical simulation lasting 25ms has been carried out for each analyzed case, capturing the acoustic pressure within the tube and in the near-field in a prefixed grid with a spatial resolution of 0.002m, to allow comparisons with experiments. The simulated transfer functions for both configurations are presented in figure 3. Figure 3 : FEM simulations of the transfer function of two connected tubes. Left curve : centered tubes, right curve : eccentered tubes. This result confirms the theoretical expectation presented in figure 2. Some discrepancies can however be observed at high frequency which can probably be attributed to the different radiation models 5. Conclusions This paper describes an extension of the plane wave theory to the case of 3-D vocal tract geometry. The theoretical model has been successfully compared with both FEM simulations and experimental data obtained on casts of vocal tracts. The 3-D effects appeared to be significant in the high frequency domain and depend strongly on the geometrical discretization. The apparition of higher acoustical modes leaded to zeros in the transfer function (at the cut-on frequency of these modes) and to extra resonances. Further, contrarily to plane waves, these higher modes generate a highly directive sound pressure field. Because these effects occur in the higher frequency range, their relevance for vowels might be little since the glottal sound source is of low frequency nature. However, in the case of plosives or fricatives we probably can expect them to be significant. Additional experiments and simulations will be performed to confirm this conclusion. 6. Acknowledgements This research has been supported by EU-FET grant EUNISON 308874. REFERENCES [1] Fant G. (1960) Acoustic Theory of Speech Production. Mouton, The Hague. [2] Flanagan J.L.(1972) Speech Analysis Synthesis and Perception, 2nd Edition, SpringerVerlag, Berlin. [3] Story B.H. 2008 Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. J Acoust Soc Am., 123,327-35. [4] Elmasri S., Pelorson X., Saguet P., Badin P. (1998). The use of the Transmission Line Matrix in acoustics and in Speech. International Journal of Numerical Modeling, 11, 133151. [5] Laboissière R., Yehia H.C., Pelorson X. Higher order modes propagation in the human vocal tract, Proceedings of Acoustics 2012, Nantes, France. [6] Grote M. and Sim I., 2010. Finite element computation of elliptical vocal tract impedances using the two-microphone transfer function method, Journal of the Acoustical Society of America, 133 (6), 4197–4209. [8] Ihlenburg F., 1998. Finite Element Analysis of Acoustic Scattering, Applied Mathematical Sciences, Springer, Berlin, Chap. 2.
