Audibility of lossy compression in music recordings at various bit rates

Audibility of lossy compression in music
recordings at various bit rates
Agata Rogowska
Institute of Radioelectronics, Warsaw University of Technology, Warsaw, Poland
Jan Żera
Institute of Radioelectronics, Warsaw University of Technology, Warsaw, Poland
Summary
The aim of the study was to determine the audibility of lossy compression introduced by Vorbis,
WMA, MP3-Fraunhofer and MP3-Lame codecs. In the experiments, the percentage of correct
discrimination of compressed sound samples was determined as a function of the bit rate. In
Experiment 1, discrimination curves were determined for three naïve subjects. In Experiment 2,
the three naïve subjects performed a discrimination task at low bit rates of 32 and 48 kbps in
background noise conditions (traffic noise) to mimic typical conditions of listening to music
played from a portable player. Experiment 3 was designed to measure the effect of subjects’
listening pre-training in lossy compression discrimination. This experiment involved 26 students
of sound engineering and 32 naïve subjects. Samples of classical and pop music were used in all
three experiments. The results showed that lossy compression becomes inaudible for bit rates of
96–128 kbps. The presence of background noise had no significant effect on discrimination.
Experienced listeners had discrimination thresholds approx. 16 kbps lower than naïve subjects.
The results demonstrated considerable differences for different codecs, with best and poorest
performances of WMA/Lame and MP3-Fraunhoffer codecs, respectively.
PACS no. 43.60.Dh, 43.66.Lj
1. Introduction
The rapid growth of the digital music market
along with a general availability of smartphones
has encouraged listeners to use music streaming
services instead of storing music files on their
local devices. However, due to bandwidth
limitations, online music streaming necessitates a
significant reduction in the size of downloaded
audio files. For these purposes, lossy compression
has proven more efficient than lossless
compression, even though the former may
irreversibly reduce the quality of the processed
sound. The final size of audio files and the change
in sound quality primarily depend on the
compression rate. Higher compression rates allow
for greater size reduction of the compressed files
but at the same time they result in irreversible
changes in sound quality and render the
compressed sound very different from its original.
This leads to the question what is the compression
rate at which the compressed sound becomes
virtually indistinguishable from the uncompressed
original. Earlier work conducted on isolated
sounds of musical instruments showed that the
discrimination ability decreases quickly as the bit
rate increases for sounds of selected musical
instruments compressed with MP3-Lame codec
[1], and that the discrimination levels are
significantly different for selected sounds
compressed with AC-3, E-AC-3 and HE-AAC
codecs [2].
This study aimed to determine how the
discrimination
between
compressed
and
uncompressed music samples depended on the
compression rate or, equivalently, the bit rate.
Tests were conducted on samples of classical and
popular music [3, 4].
2. Test Material
Test material used in the study was prepared by
encoding and decoding sound samples in four
lossy compression formats—Vorbis [5], WMA
[6], MP3-Fraunhofer [7], MP3-Lame [8]—at
different bit rates. Lossy compression was done at
six bit rates of 32, 48, 64, 80, 96 and 128 kbps.
These bit rates correspond to a compression ratio
change from about 44 to 11.
To preserve the variety of music genres in
listening tests, seven 4-s samples of different
music pieces were used. The set of samples
included classical music with the horn, the violin,
the clarinet and the piano featuring as main
instruments (four samples), and popular music of
rock, bossa nova and soft pop (three samples). The
dBpoweramp Music Converter (free version) was
used to encode and decode files with a sampling
rate of 44,100 Hz.
Material of the listening tests comprised 49
music samples (7 music samples × 6 bit rates and
the original) compressed by WMA, MP3Fraunhofer and MP3-Lame codecs, and 42
samples (5 bit rates) compressed by the Vorbis
codec. All audio signals were equalized for
duration and equivalent level. Differences in the
level of all compressed sounds for a music sample
were below 0.1 dB. Variability in the level of
various music samples was smaller than 2 dB. In
all experiments, stimuli presentation level was set
to an average of 84 dB SPL.
3. Listening Procedure
The ABX constant stimuli discrimination
paradigm was used in the experiments. Listeners
were presented with three signals: the compressed
sound (A), the original sound (B) and the test
sound X (A or B). The listener’s task was to
denote whether the test sound X was that of sound
A or B. Stimulus presentation was balanced by
presenting the stimuli in all possible orders for an
equal number of times in a presentation block.
Signals A, B and X were separated by 500-ms
silence intervals. Test trials were separated by
1,000-ms silence intervals for listeners to respond.
All conditions, including the music samples, bit
rate and ABX order, were randomized. Such
signal presentation was repeated for all codecs and
all types of music samples.
The ABX procedure allowed us to obtain
response curves (psychometric functions) for
discrimination of compressed vs uncompressed
sound samples with 100% correct discrimination
at low bit rates and asymptotically reaching 50%
chance discrimination at sufficiently high bit rates.
4. Results
Figures 1–2 show the discrimination scores for
four codecs averaged over three subjects
participating in Experiment 1. The subjects were
students of Warsaw University of Technology
(WUT) with no previous experience in listening
tests or training in sound assessment. The
percentage of correct discrimination between
compressed and original sound is plotted as a
function of the bit rate. Each subject provided 100
responses at each bit rate. At the lowest bit rate of
32 kbps, both in the case of classical and popular
music, discrimination scores for all codecs
exceeded 95% of correct responses. For the Vorbis
codec, at the lowest bit rate available for this
codec (48 kbps) 80% of responses were correct,
whereas for the MP3-Fraunhofer codec the correct
discrimination scores for all subjects exceeded
96%. This shows that compression at low bit rates
is much better with the use of the Vorbis codec
than the MP3-Fraunhofer codec. For all codecs,
with the bit rate increased from 32 to 96 kbps the
discrimination scores decreased to about 60% of
correct responses. At 128 kbps the discrimination
scores reached nearly random level of 50% of
correct responses meaning that the discrimination
between compressed and uncompressed sound was
not possible. It has to be mentioned that a large
intersubject variability was observed in the
transition interval of the 64–96 kbps.
Figure 1. Discrimination curves for four codecs.
Average for three subjects (standard error = 0.9÷8.7%).
Classical music.
As regards popular music (Figure 2), all trends
seen for classical music in Figure 1 were
preserved yet higher discrimination scores were
obtained in the range of 48–96 kbps and the
intersubject variability was smaller. Differences
between codecs were similar to those observed for
Figure 3 shows average discrimination in groups of untrained and trained subjects for WMA
and MP3-Fraunhofer codecs (Experiment 3). The group of untrained subjects comprised 32
WUT students with no previous training in listening.
classical music showing high discrimination of
compression with the MP3-Fraunhofer codec and
low discrimination of compression with the Vorbis
or WMA codec.
Figure 2. Discrimination curves for four codecs. Ave-
rage for three subjects (standard error = 1.4÷12.7%).
Popular music.
Data shown in Figures 1–2 represent the results
of discrimination obtained in quiet, which are not
typical conditions of everyday listening to MP3
files. In Experiment 2, the background of traffic
noise recorded on an underground metropolitan
train was added to the listening tests. Only the
lowest two bit rates of 32 and 48 kbps were
studied, as these bit rates had sufficiently high
discrimination in a quiet environment (cf. Figures
1–2) to show any degradation by the background
noise. The results of measurements showed that at
signal-to-noise ratios within the range of +4 to
+16 dB the noise did not significantly affect the
discrimination scores. In general, performance
decreased by about 10 percentage points. It has to
be noted, however, that due to experimental
constraints the noise levels used in the listening
tests were about 16 dB lower than levels usually
experienced on metropolitan trains.
Figure 3 shows average discrimination over
groups of untrained and trained subjects for WMA
and MP3-Fraunhofer codecs (Experiment 3).
Group of untrained subjects comprised of 32 WUT
students who had no previous listening training.
The group of trained subjects comprised 24
students from the Sound Engineering Department
of the Fryderyk Chopin University of Music
(FCUM), who had extensive professional training
in listening to technically modified sounds and
music. The subjects listened to the same sound
samples as in Experiment 1 except that each
subject listened to the sounds only once. A single
measurement session lasted for about 60 minutes.
In the case of classical music, FCUM students
discriminated compressed sound noticeably better
than inexperienced WUT students did (Figure 3).
As regards popular music, experienced listeners
also performed better but the difference from
inexperienced listeners was not so large. In
general, differences in discrimination between
experienced listeners (FCUM students) and
inexperienced listeners (WUT students) were such
that they corresponded to about a 16-kbps change
in the bit rate.
5. Conclusions
The results of experiments conducted in this study
showed that, in all listening conditions, music
samples compressed by lossy compression codecs
proved
undistinguishable
from
original
uncompressed samples for bit rates of approx. 96–
128 kbps. Below the bit rate of 96 kbps there was
a gradual increase in discrimination with vast
differences among the four codecs tested.
Noticeable superiority (least discrimination) was
seen for WMA and Vorbis codecs. Within the
levels tested, traffic noise background did not
deteriorate the listeners’ ability to discriminate
compressed sound. It was also shown that the
subjects’ training in listening noticeably enhanced
their ability to discriminate sounds stored in audio
files with lossy compression.
References
[1] C. Lee, A. Horner: Discrimination of MP3-Compressed
Musical Instrument Tones, Journal of the Audio
Engineering Society (2010), 58(6), 487–497.
[2] L. Gaston, G. Sanders: Evaluation of HE-AAC, AC-3
and E-AC-3 Codecs, Journal of the Audio Engineering
Society (2008), 56(3), 140–155.
[3] A. Rogowska, J. Żera: Audibility of lossy compression
in musical recordings, ISSET 2013, Cracow (2013).
[4] A. Rogowska, J. Żera: Discrimination of lossy
compression in musical recordings by listeners with
different auditory training, in Postępy Akustyki
L. Leniowska, A. Brański (Eds.), Polish Acoustical
Society, 2013, pp. 448–445.
[5] http://www.xiph.org/vorbis/.
[6] Windows Media 9 Series Capabilities and Benefits
Overview" (DOC). International Narcotics Control
Board. Retrieved 2007-08-16.
[7] www.iis.fraunhofer.de.
[8] http://lame.sourceforge.net/.