Audibility of lossy compression in music recordings at various bit rates Agata Rogowska Institute of Radioelectronics, Warsaw University of Technology, Warsaw, Poland Jan Żera Institute of Radioelectronics, Warsaw University of Technology, Warsaw, Poland Summary The aim of the study was to determine the audibility of lossy compression introduced by Vorbis, WMA, MP3-Fraunhofer and MP3-Lame codecs. In the experiments, the percentage of correct discrimination of compressed sound samples was determined as a function of the bit rate. In Experiment 1, discrimination curves were determined for three naïve subjects. In Experiment 2, the three naïve subjects performed a discrimination task at low bit rates of 32 and 48 kbps in background noise conditions (traffic noise) to mimic typical conditions of listening to music played from a portable player. Experiment 3 was designed to measure the effect of subjects’ listening pre-training in lossy compression discrimination. This experiment involved 26 students of sound engineering and 32 naïve subjects. Samples of classical and pop music were used in all three experiments. The results showed that lossy compression becomes inaudible for bit rates of 96–128 kbps. The presence of background noise had no significant effect on discrimination. Experienced listeners had discrimination thresholds approx. 16 kbps lower than naïve subjects. The results demonstrated considerable differences for different codecs, with best and poorest performances of WMA/Lame and MP3-Fraunhoffer codecs, respectively. PACS no. 43.60.Dh, 43.66.Lj 1. Introduction The rapid growth of the digital music market along with a general availability of smartphones has encouraged listeners to use music streaming services instead of storing music files on their local devices. However, due to bandwidth limitations, online music streaming necessitates a significant reduction in the size of downloaded audio files. For these purposes, lossy compression has proven more efficient than lossless compression, even though the former may irreversibly reduce the quality of the processed sound. The final size of audio files and the change in sound quality primarily depend on the compression rate. Higher compression rates allow for greater size reduction of the compressed files but at the same time they result in irreversible changes in sound quality and render the compressed sound very different from its original. This leads to the question what is the compression rate at which the compressed sound becomes virtually indistinguishable from the uncompressed original. Earlier work conducted on isolated sounds of musical instruments showed that the discrimination ability decreases quickly as the bit rate increases for sounds of selected musical instruments compressed with MP3-Lame codec [1], and that the discrimination levels are significantly different for selected sounds compressed with AC-3, E-AC-3 and HE-AAC codecs [2]. This study aimed to determine how the discrimination between compressed and uncompressed music samples depended on the compression rate or, equivalently, the bit rate. Tests were conducted on samples of classical and popular music [3, 4]. 2. Test Material Test material used in the study was prepared by encoding and decoding sound samples in four lossy compression formats—Vorbis [5], WMA [6], MP3-Fraunhofer [7], MP3-Lame [8]—at different bit rates. Lossy compression was done at six bit rates of 32, 48, 64, 80, 96 and 128 kbps. These bit rates correspond to a compression ratio change from about 44 to 11. To preserve the variety of music genres in listening tests, seven 4-s samples of different music pieces were used. The set of samples included classical music with the horn, the violin, the clarinet and the piano featuring as main instruments (four samples), and popular music of rock, bossa nova and soft pop (three samples). The dBpoweramp Music Converter (free version) was used to encode and decode files with a sampling rate of 44,100 Hz. Material of the listening tests comprised 49 music samples (7 music samples × 6 bit rates and the original) compressed by WMA, MP3Fraunhofer and MP3-Lame codecs, and 42 samples (5 bit rates) compressed by the Vorbis codec. All audio signals were equalized for duration and equivalent level. Differences in the level of all compressed sounds for a music sample were below 0.1 dB. Variability in the level of various music samples was smaller than 2 dB. In all experiments, stimuli presentation level was set to an average of 84 dB SPL. 3. Listening Procedure The ABX constant stimuli discrimination paradigm was used in the experiments. Listeners were presented with three signals: the compressed sound (A), the original sound (B) and the test sound X (A or B). The listener’s task was to denote whether the test sound X was that of sound A or B. Stimulus presentation was balanced by presenting the stimuli in all possible orders for an equal number of times in a presentation block. Signals A, B and X were separated by 500-ms silence intervals. Test trials were separated by 1,000-ms silence intervals for listeners to respond. All conditions, including the music samples, bit rate and ABX order, were randomized. Such signal presentation was repeated for all codecs and all types of music samples. The ABX procedure allowed us to obtain response curves (psychometric functions) for discrimination of compressed vs uncompressed sound samples with 100% correct discrimination at low bit rates and asymptotically reaching 50% chance discrimination at sufficiently high bit rates. 4. Results Figures 1–2 show the discrimination scores for four codecs averaged over three subjects participating in Experiment 1. The subjects were students of Warsaw University of Technology (WUT) with no previous experience in listening tests or training in sound assessment. The percentage of correct discrimination between compressed and original sound is plotted as a function of the bit rate. Each subject provided 100 responses at each bit rate. At the lowest bit rate of 32 kbps, both in the case of classical and popular music, discrimination scores for all codecs exceeded 95% of correct responses. For the Vorbis codec, at the lowest bit rate available for this codec (48 kbps) 80% of responses were correct, whereas for the MP3-Fraunhofer codec the correct discrimination scores for all subjects exceeded 96%. This shows that compression at low bit rates is much better with the use of the Vorbis codec than the MP3-Fraunhofer codec. For all codecs, with the bit rate increased from 32 to 96 kbps the discrimination scores decreased to about 60% of correct responses. At 128 kbps the discrimination scores reached nearly random level of 50% of correct responses meaning that the discrimination between compressed and uncompressed sound was not possible. It has to be mentioned that a large intersubject variability was observed in the transition interval of the 64–96 kbps. Figure 1. Discrimination curves for four codecs. Average for three subjects (standard error = 0.9÷8.7%). Classical music. As regards popular music (Figure 2), all trends seen for classical music in Figure 1 were preserved yet higher discrimination scores were obtained in the range of 48–96 kbps and the intersubject variability was smaller. Differences between codecs were similar to those observed for Figure 3 shows average discrimination in groups of untrained and trained subjects for WMA and MP3-Fraunhofer codecs (Experiment 3). The group of untrained subjects comprised 32 WUT students with no previous training in listening. classical music showing high discrimination of compression with the MP3-Fraunhofer codec and low discrimination of compression with the Vorbis or WMA codec. Figure 2. Discrimination curves for four codecs. Ave- rage for three subjects (standard error = 1.4÷12.7%). Popular music. Data shown in Figures 1–2 represent the results of discrimination obtained in quiet, which are not typical conditions of everyday listening to MP3 files. In Experiment 2, the background of traffic noise recorded on an underground metropolitan train was added to the listening tests. Only the lowest two bit rates of 32 and 48 kbps were studied, as these bit rates had sufficiently high discrimination in a quiet environment (cf. Figures 1–2) to show any degradation by the background noise. The results of measurements showed that at signal-to-noise ratios within the range of +4 to +16 dB the noise did not significantly affect the discrimination scores. In general, performance decreased by about 10 percentage points. It has to be noted, however, that due to experimental constraints the noise levels used in the listening tests were about 16 dB lower than levels usually experienced on metropolitan trains. Figure 3 shows average discrimination over groups of untrained and trained subjects for WMA and MP3-Fraunhofer codecs (Experiment 3). Group of untrained subjects comprised of 32 WUT students who had no previous listening training. The group of trained subjects comprised 24 students from the Sound Engineering Department of the Fryderyk Chopin University of Music (FCUM), who had extensive professional training in listening to technically modified sounds and music. The subjects listened to the same sound samples as in Experiment 1 except that each subject listened to the sounds only once. A single measurement session lasted for about 60 minutes. In the case of classical music, FCUM students discriminated compressed sound noticeably better than inexperienced WUT students did (Figure 3). As regards popular music, experienced listeners also performed better but the difference from inexperienced listeners was not so large. In general, differences in discrimination between experienced listeners (FCUM students) and inexperienced listeners (WUT students) were such that they corresponded to about a 16-kbps change in the bit rate. 5. Conclusions The results of experiments conducted in this study showed that, in all listening conditions, music samples compressed by lossy compression codecs proved undistinguishable from original uncompressed samples for bit rates of approx. 96– 128 kbps. Below the bit rate of 96 kbps there was a gradual increase in discrimination with vast differences among the four codecs tested. Noticeable superiority (least discrimination) was seen for WMA and Vorbis codecs. Within the levels tested, traffic noise background did not deteriorate the listeners’ ability to discriminate compressed sound. It was also shown that the subjects’ training in listening noticeably enhanced their ability to discriminate sounds stored in audio files with lossy compression. References [1] C. Lee, A. Horner: Discrimination of MP3-Compressed Musical Instrument Tones, Journal of the Audio Engineering Society (2010), 58(6), 487–497. [2] L. Gaston, G. Sanders: Evaluation of HE-AAC, AC-3 and E-AC-3 Codecs, Journal of the Audio Engineering Society (2008), 56(3), 140–155. [3] A. Rogowska, J. Żera: Audibility of lossy compression in musical recordings, ISSET 2013, Cracow (2013). [4] A. Rogowska, J. Żera: Discrimination of lossy compression in musical recordings by listeners with different auditory training, in Postępy Akustyki L. Leniowska, A. Brański (Eds.), Polish Acoustical Society, 2013, pp. 448–445. [5] http://www.xiph.org/vorbis/. [6] Windows Media 9 Series Capabilities and Benefits Overview" (DOC). International Narcotics Control Board. Retrieved 2007-08-16. [7] www.iis.fraunhofer.de. [8] http://lame.sourceforge.net/.
© Copyright 2024 ExpyDoc