Audio Lifelogging

Audio Lifelogging
Dan Ellis
Laboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Eng., Columbia Univ., NY USA
!
[email protected]
1.
2.
3.
4.
Audio Lifelogging - Dan Ellis
http://labrosa.ee.columbia.edu/
Audio Lifelogging
Speech
Environmental Sound Analysis
Medical Applications
2014-01-29 1 /6
Personal Audio Lifelogs
!
!
• Easy to record everything you hear
~250GB / year @ 64 kbps
!
!
• Very hard to find anything
how to scan?
how to visualize?
how to index?
Audio Lifelogging - Dan Ellis
2014-01-29
2 /6
Datasets and tasks
Speech Recognition
E’ noise backgrounds
Transcripts of discussions would be useful…
•
noise backgrounds recorded in a family home (living room).
distant-mic / noisy ASR still quite limited
sources, well-defined application domain with a learnable
CHiME (Comp. Hearing in Track
Multisource
2 resultsEnv.s):
Results
cabulary’ and ‘grammar’.
80
70
Word error rate (%)
60
Results
Track 2 results
80
70
Word error rate (%)
60
50
40
40
30
20
10
30
20
10
A
A
TU
TU
FB
M
50
0
−6
ASR Baseline (reverb)
ASR Baseline (noisy)
TU Tampere & KU Leuven
TU Munich, TUT, KUL & BMW
FBK−Irst & INESC−ID
Mitsubishi Electric
−3
0
3
SNR (dB)
6
9
14 h of audio in 0.5 to 1.5 h sessions
over
Barker several
et al.Best
’13
weeks.
0
system:
spatial enhancement, MLLT, SAT, LD
−6
−3
0
3
6
9
SNR (dB)
http://spandh.dcs.shef.ac.uk/chime_workshop/slides/CHiME13_overview.pdf
augmentation,
bMMIfeature
noise-adaptive training, DLM
Best system: spatial enhancement,
SAT, LDA, f-bMMI,
The 2nd ‘CHiME’ Challenge
( MLLT,
01/06/2013
6 / 23
Audio Lifelogging - Dan Ellis
The2014-01-29
2nd ‘CHiME’ Challenge3 /6
augmentation, bMMI noise-adaptive training, DLM and MBR decoding
Environmental Sound Classification
Ellis, Zheng, McDermott ’11
• Classify soundtracks with “texture” features
!
!
Sound
\x\
Automatic
gain
control
\x\
mel
filterbank
(18 chans)
!
!
• Mixed results…
Audio Lifelogging - Dan Ellis
FFT
\x\
\x\
\x\
\x\
Envelope
correlation
Histogram
Octave bins
0.5,1,2,4,8,16 Hz
Modulation
energy
(18 x 6)
mean, var,
skew, kurt
(18 x 4)
Cross-band
correlations
(318 samples)
2014-01-29 4
/6
Segmentation & Clustering
•
!
Features from 1 min windows
!
• Segmentation by local statistics
!
• Cluster across •
Lee & Ellis ’04
09:00
09:30
10:00
10:30
11:00
11:30
preschool
cafe
Ron
lecture
12:30
office
outdoor
group
L2
cafe
office
outdoor
lecture
outdoor
DSP03
compmtg
meeting2
13:30
lab
14:00
cafe
meeting2 Manuel
outdoor
office
cafe
office
Mike
Arroyo?
outdoor
Sambarta?
14:30
whole set
!
16:00
Audio Lifelogging - Dan Ellis
cafe
office
13:00
2004-09-14
preschool
12:00
15:00
Manual labels of each class
2004-09-13
15:30
office
office
office
postlec
office
Lesser
16:30
17:00
17:30
18:00
outdoor
lab
cafe
2014-01-29 5
/6
Medical Applications
• Tracking behavior & symptoms
coughing
sleep disturbances
• Tracking hospital interactions
correlate recordings
Audio Lifelogging - Dan Ellis
2014-01-29
6 /6
Privacy
freq / kHz
scramble audio over 200ms windows
freq / kHz
• Privacy-preserving features
4
Original (dan+kean-ex.wav)
2
20
0
0
-20
4
Scrambled (200ms wins over 1s)
-40
-60
level / dB
2
0
0
2
4
6
8
10
12
14 time / s
• Self-audio only
augmented-reality
earphones/mics
Hearium
Audio Lifelogging - Dan Ellis
2014-01-29
7 /6
Summary
• Personal Audio Lifelogs
Too good to waste!
!
• Speech & Nonspeech Content
Both useful in different ways
!
• Applications
Personal information, behavior measurement
Audio Lifelogging - Dan Ellis
2014-01-29 8
/6