1. Personal Audio 2. Features 3. Segmentation 4. Clustering 5. Future Work

Size: px

Start display at page:

Download "1. Personal Audio 2. Features 3. Segmentation 4. Clustering 5. Future Work"

Aubrey Chase
6 years ago
Views:

1 Segmenting and Classifying Long-Duration Recordings of Personal Audio Dan Ellis and Keansub Lee Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA 1. Personal Audio 2. Features 3. Segmentation 4. Clustering. Future Work

1. Personal Audio Easy to record everything you hear <2GB / week @ 64 kbps Very hard

2 1. Personal Audio Easy to record everything you hear <2GB / 64 kbps Very hard to find anything how to scan? how to visualize? how to index? Need automatic analysis

3 Applications Automatic appointment-book history fills in when & where of movements Life statistics how long did I spend in meetings this week vs. last most frequent conversations favorite phrases?? Retrieving details what exactly did I promise? privacy issues... Nostalgia?

4 Data Set Starting point: Collect data 62 hours recorded (8 days, ~7. hr/day) hand-mark 139 segments (26 min/seg avg.) assign to 16 classes (11 have multiple instances) Label total mins total segs Library Campus 70 6 Restaurant 60 Bowling Lecture Car/Taxi 16 7 Street

5 2. Features Long duration recordings may benefit from longer basic time-frames 60s rather than ms? Perceptually-motivated features broad spectrum + some detail? For diary application... background more important than foreground? smooth out uncharacteristic transients

6 Feature sets Average Linear Energy 1 Normalized Energy Deviation 60 freq / bark freq / bark 1 40 Average Log Energy 60 db 1 Log Energy Deviation db 1 freq / bark freq / bark 1 1 Average Spectral Entropy db freq / bark freq / bark 1 1 Spectral Entropy Deviation db bits time / min Capture both average and variation Capture a little more detail in subbands... bits

7 Spectral Entropy Auditory spectrum: Spectral entropy peakiness of each band: H[n, j] = N F! k=0 w jk X[n,k] A[n, j] A[n, j] = N! F w jk X[n,k] k=0 ( ) w jk X[n,k] log A[n, j] energy / db FFT spectral magnitude Auditory Spectrum rel. entropy / bits per-band Spectral Entropies freq / Hz

8 3. BIC segmentation BIC (Bayesian Information Criterion): Compare more and less complex models log L(X 1;M 1 )L(X 2 ;M 2 ) L(X;M 0 ) λ 2 log(n) #(M) For segmentation: Grow context window from current boundary For each window, test every possible segmentation When BIC is positive, mark new segment last segmentation point candidate boundary current context limit 0 N time L(X 1 ;M 1 ) L(X 2 ;M 2 ) L(X;M 0 )

9 BIC Segmentation Example _AvgLEnergy AvgLogAudSpec 1 BIC score last seg point no boundary found with shorter window 13:30 14:00 14:30 1:00 1:30 16:00 No training or stored models boundary passes BIC current window limit time / hr

10 Segmentation Results Evaluate: 60hr hand-marked boundaries different features & combinations Correct Accept False Accept = 2%: Feature Correct Accept µdb 80.8% µh 81.1% σh/µh 81.6% µdb + σh/µh 84.0% µdb + σh/µh + µh 83.6% avg. mfcc 73.6% Sensitivity µ db µ H! H /µ H µ db +! H /µ H µ db + µ H +! H /µ H Specificity

11 4. Segment clustering Daily activity has lots of repetition: Automatically cluster similar segments affinity of segments as KL2 distances supermkt meeting karaoke barber lecture2 billiard break lecture1 car/taxi home bowling street restaurant library campus cmp lib rst str

Spectral Clustering Eigenanalysis of affinity

components: u k s kk v k ' 900 800 k=1 k=2 800

400 0 0 0 400 600 800 0 400 600 800 eigenvectors

12 Spectral Clustering Eigenanalysis of affinity matrix: A = U S V Affinity Matrix SVD components: u k s kk v k ' k=1 k= k=3 k= eigenvectors v k give cluster memberships Number of clusters?

13 Clustering Results Clustering of automatic segments gives anonymous classes BIC criterion to choose number of clusters make best correspondence to 16 GT clusters Hand-marked boundaries freq / Bark 1 21:00 22:00 23:00 0:00 1:00 2:00 3:00 4:00 Clock time Frame-level scoring gives ~70% correct errors when same place has multiple ambiences clusters formed by strong foregrounds (voices)

14 . Future Work Visualization / browsing / diary inference link in other information sources

15 Privacy Recording conversations conflicts with expectations of privacy critical barrier to progress Technical solutions to improve acceptance? Speaker/speech search and destroy scramble 0ms segs of speech (preserving longer-term statistics) high-confidence speaker ID to bypass

16 Conclusions Personal Audio is easy & cheap to collect but is it any use? Boundaries quite easy to spot moving to a new location change in activity (talking <> reading) Repeated activities can cluster together.. so user s labels can propagate Still gaining experience with the data speech is the most interesting part -.. but very hard to transcribe speaker ID, privacy,...

Features for segmenting and classifying long-duration recordings of personal audio

Features for segmenting and classifying long-duration recordings of personal audio Daniel P.W. Ellis and Keansub Lee LabROSA, Dept. of Electrical Engineering, Columbia University, NY NY 027 USA {dpwe,kslee}@ee.columbia.edu