Geometric and algebraic structures in pattern recognition

Similar documents
March 19 - Solving Linear Systems

PRIMES Math Problem Set

Instructor (Brad Osgood)

Tropical decomposition of symmetric tensors

Data Mining and Matrices

An Intuitive Introduction to Motivic Homotopy Theory Vladimir Voevodsky

13 Searching the Web with the SVD

9 Searching the Internet with the SVD

x n -2.5 Definition A list is a list of objects, where multiplicity is allowed, and order matters. For example, as lists

Lecture 8: Determinants I

Lecture 5: Web Searching using the SVD

SPARSE signal representations have gained popularity in recent

Why Sparse Coding Works

Matrix Basic Concepts

Getting Started with Communications Engineering

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

The Laplace Expansion Theorem: Computing the Determinants and Inverses of Matrices

1 - Systems of Linear Equations

Google Page Rank Project Linear Algebra Summer 2012

Linear Algebra, Summer 2011, pt. 3

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

Wavelets and Signal Processing

A description of a math circle set of activities around polynomials, especially interpolation.

The calibrated trifocal variety

Image Compression. 1. Introduction. Greg Ames Dec 07, 2002

The Algebraic Degree of Semidefinite Programming

Numerical Methods I Non-Square and Sparse Linear Systems

Tensors: a geometric view Open lecture November 24, 2014 Simons Institute, Berkeley. Giorgio Ottaviani, Università di Firenze

Notes on Latent Semantic Analysis

Lecture: Local Spectral Methods (1 of 4)

Linear Algebra, Summer 2011, pt. 2

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides

EUSIPCO

Hyperdeterminants of Polynomials

LS.1 Review of Linear Algebra

A new parametrization for binary hidden Markov modes

Math 138: Introduction to solving systems of equations with matrices. The Concept of Balance for Systems of Equations

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

Chapter 1. Linear Equations

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

AN INTRODUCTION TO AFFINE TORIC VARIETIES: EMBEDDINGS AND IDEALS

ALGEBRA: From Linear to Non-Linear. Bernd Sturmfels University of California at Berkeley

WEEK 1 THE BIBLE BIG IDEA WELCOME TIME DISCUSSION TIME TEACHING TIME PLAY TIME PRESCHOOL LESSON OUTLINE THIS WEEK AT A GLANCE

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

UMA Putnam Talk LINEAR ALGEBRA TRICKS FOR THE PUTNAM

Lecture 3: Special Matrices

MITOCW ocw-18_02-f07-lec02_220k

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

A Review of Matrix Analysis

Topic 2 Quiz 2. choice C implies B and B implies C. correct-choice C implies B, but B does not imply C

Linear Algebra Review. Fei-Fei Li

1 Matrices and Systems of Linear Equations. a 1n a 2n

2.3. Clustering or vector quantization 57

Generic Text Summarization

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Lecture 6: Finite Fields

FALL 2011, LECTURE 1 (9/8/11) This is subject to revision. Current version: Thu, Sep 8, 1:00 PM

Real Analysis Prof. S.H. Kulkarni Department of Mathematics Indian Institute of Technology, Madras. Lecture - 13 Conditional Convergence

Algebraic Statistics progress report

Semidefinite Programming

Latitude and Longitude, Plus Map Scale

University of Colorado at Boulder ECEN 4/5532. Lab 2 Lab report due on February 16, 2015

Singular Value Decomposition and Digital Image Compression

Tensors. Notes by Mateusz Michalek and Bernd Sturmfels for the lecture on June 5, 2018, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra

MATRICES. a m,1 a m,n A =

Learning goals: students learn to use the SVD to find good approximations to matrices and to compute the pseudoinverse.

ALGEBRAIC DEGREE OF POLYNOMIAL OPTIMIZATION. 1. Introduction. f 0 (x)

Lab 2 Worksheet. Problems. Problem 1: Geometry and Linear Equations

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Chapter 1: Linear Equations

Third-Order Tensor Decompositions and Their Application in Quantum Chemistry

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

CPSC 540: Machine Learning

Linear Least-Squares Data Fitting

Simple Interactions CS 105 Lecture 2 Jan 26. Matthew Stone

Linear Algebra. Chapter Linear Equations

Probabilistic Near-Duplicate. Detection Using Simhash

Math 103, Summer 2006 Determinants July 25, 2006 DETERMINANTS. 1. Some Motivation

SPECTRAHEDRA. Bernd Sturmfels UC Berkeley

The Generalized Haar-Walsh Transform (GHWT) for Data Analysis on Graphs and Networks

Linear Algebra: Lecture Notes. Dr Rachel Quinlan School of Mathematics, Statistics and Applied Mathematics NUI Galway

Chapter 1: Linear Equations

Determinants of 2 2 Matrices

Lecture 25: November 27

GAP CLOSING. Algebraic Expressions. Intermediate / Senior Facilitator s Guide

FOUNDATIONS OF ALGEBRAIC GEOMETRY CLASS 2

Vector Spaces, Orthogonality, and Linear Least Squares

Lecture 8: A Crash Course in Linear Algebra

Algorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Lecture Notes 5: Multiresolution Analysis

A Potpourri of Nonlinear Algebra

LINEAR ALGEBRA KNOWLEDGE SURVEY

Structured matrix factorizations. Example: Eigenfaces

LINEAR SYSTEMS, MATRICES, AND VECTORS

Review of similarity transformation and Singular Value Decomposition

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

INFO 4300 / CS4300 Information Retrieval. IR 9: Linear Algebra Review

Power System Analysis Prof. A. K. Sinha Department of Electrical Engineering Indian Institute of Technology, Kharagpur. Lecture - 21 Power Flow VI

Tighter Low-rank Approximation via Sampling the Leveraged Element

Postgraduate Course Signal Processing for Big Data (MSc)

Transcription:

Geometric and algebraic structures in pattern recognition Luke Oeding Department of Mathematics, University of California, Berkeley April 30, 2012 Multimedia Pattern Recognition Rolf Bardeli mmprec.iais.fraunhofer.de/ bardeli Partial support from NSF Award No. 0853000: International Research Fellowship Program (IRFP). Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 1 / 23

Something to solve by the end of the talk What are the dimensions and degrees of the following two algebraic varieties? (take Zariski closure and = anything) X = Y = 0 0 0 0 ( ) 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ( ) 0 0 0 + 0 0 0 0 0 0 0 0 0 0 (Thanks to Bernd Sturmfels and Zvi Rosen for these examples.) Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 2 / 23

What is a musical score? Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 3 / 23

What is a musical score? Typical 4 minute long song, 160 bpm, 32 nd note smallest increment = 20480 places to start playing a note. T := {1,..., N} set of possible start times (ignore duration) (e.g. N = 20480) A simplified piano has 88 keys or pitches. F := {1,..., k} set of possible notes (e.g. k = 88 notes) The simplest score D is a collection of start times, and notes Naively: D is a sparse matrix in {0, 1} k N = F T at a given time very few notes are simultaneously played. Better: Store D as a collection of events: D = (p i, t i ) i. Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 4 / 23

Musical Databases Let M = F T the set of musical notes. A database D = (D 1,..., D n ) is a collection of scores D i M. Usually well annotated. A query Q M may be a short audio file, or, if the file has already been pre-processed, a short collection of notes (musical phrase). Find the set of documents {D i } such that Q D i (approximate). To aid the search, try to find and exploit natural symmetries group actions by scaling, translation, frieze symmetry, etc. Question Given an N k matrix D with 0 < k << N, develop efficient methods (or improve current ones) to find (useful) symmetry among the rows. [Bardeli R., Similarity Search in Animal Sound Databases, IEEE Transactions on Multimedia (2009).] Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 5 / 23

Finding a bird by its song Which birds are living in this German marshland? Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 6 / 23

Finding a bird by its song Which birds are living in this German marshland? Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 7 / 23

Finding a bird by its song Send groups of people (usually nature enthusiasts) out to the marsh with a pair of binoculars and a notepad to record what birds they spotted or heard. The problem here is that there are fewer and fewer enthusiasts who have an understanding of bird song and those who are there are becoming older and thus harder of hearing. Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 8 / 23

Finding a bird by its song New method: send one person with a microphone array that records geo-tagged audio files. For each file, annotate which bird songs are present. Using human volunteers, this is a huge task, and perhaps unreliable. Goal: Directly (computationally) compare field audio files to the annotated archive of audio files determine which birds are present. [R Bardeli, D Wolff, F Kurth, M Koch, KH Tauchert, KH Frommolt, Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring, Pattern Recognition Letters (2010).] Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 9 / 23

Tierstimmen archiv (animal sound archive) About 120,000 animal sound files. Well annotated, an extremely valuable resource. It can be used freely by anybody with a good (non-commercial) reason (e.g. science, nature conservation, art, teaching,...) Even though the archive is big, for the more variable bird songs it hardly coves all the variability there is. There is a similar archive at Cornell (Chris Clark). Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 10 / 23

Finding a bird by its song Analogous to searching the database of musical scores. This requires feature extraction, a topic for a different talk. Briefly, current methods involve taking the Fourier transform of the audio signal and studying the resulting data via visual properties. Informally, translate the audio file into a sequence of integer vectors representing the dominant features of the sound. For our purposes we will treat a query Q as a long integer vector (subset of musical notes). Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 11 / 23

Dictionary based methods: sparse represenation Dictionary based methods present a paradigm for feature extraction, compression, and source separation, and much more. Sample signal: x R k, a column vector of length k. Prescribed dictionary D R k N : column vectors d i (features) each of length k, and a large number N of possible features (k << N) D = d 1... d N Describe x as a linear combination of a small number of the columns of D, i.e. find (approximate) s R N x = Ds. Amounts to computing dot products of the rows of D with s. Optimization methods require this computation to be done repeatedly. (following [Sturm, Roads, et. al.], [Chandrasekaran, Parrilo, et. al.] Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 12 / 23

Dictionary based methods: sparse representation Find best s subject to x = Ds To compute Ds, must compute many dot products with s. Optimization methods require this computation to be done repeatedly. Idea: Find and exploit structure of D to speed up the computation. For example, if D has low rank, block structure, sparse structure, nice factorization, etc., this can speed up the computation considerably. Question How can we automatically find the structure in D that will make this speed-up possible? Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 13 / 23

Dictionary based methods: dictionary learning Alternately, let r 1,..., r k denote the rows vectors in R N, representing known bird songs. We would like to learn a good dictionary structure as D = r 1. r k d 1... d N We want to extract the dominant features d i and express D as a sparse matrix (plus noise). Learning a sparse structure for D means that computations x = Ds may be done more quickly. Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 14 / 23

Dictionary structures & sparsity Most restrictive: Assume dictionary atoms d i only differ by sparse additive error: D = d 1... d N = d 1... d 1 + E, where d i R k, E R k N is sparse in the sense that most of its entries are approximately zero. Say the dictionary structure is rank one plus sparse noise. Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 15 / 23

Dictionary structures & sparsity Relax assumptions: Assume dictionary atoms d i only differ by sparse additive error and linear scaling D = d 1... d N = = (d 1 )λ d 1 λ 1... d 1 λ N + E, where d i R k, λ is the column vector λ = (λ 1,..., λ N ), and E is sparse. Dictionary structure is still rank one plus sparse noise. Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 16 / 23

Dictionary structures & sparsity Relax assumptions more: Assume dictionary D = (d 1,..., d k ) only depends on linear combinations of a subset of words {d 1,..., d r } (after reordering) with r < k << N and sparse additive error. f 1,1 f 1,2... f 1,N D = d 1... d r f 2,1 f 2,2... f 2,N. + E f r,1 f r,2... f r,n = DF + E. E sparse, and r << N. Question Given a dictionary D, find efficient methods to determine the best structure of the low-rank-plus-sparse-noise format. Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 17 / 23

A motivating example: low rank plus diagonal Let N = k, D, R, E R k k, E diagonal, and rank(r) = r. What are the algebraic relations imposed on the matrix D by D = R + E If r = 0 off diagonal entries vanish. If r = 1 off-diagonal 2 2 minors vanish. If r = 2 off-diagonal 3 3 minors vanish, but aren t all the relations. [Drton, Sturmfels, Sullivant, Algebraic factor analysis: tetrads, pentads, and beyond, Probability Theory and Related Fields (2007).] Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 18 / 23

Rank 2 plus sparse error What are the dimensions and degrees of the following two algebraic varieties? (take Zariski closure and = anything) X = 0 0 0 0 ( ) 0 0 0 + 0 0 0 0 0 0 0 0 0 0 Hypersurface of degree 3: det (Thanks to Bernd Sturmfels and Zvi Rosen for these examples.) Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 19 / 23

Rank 2 plus sparse error What are the dimensions and degrees of the following two algebraic varieties? (take Zariski closure and = anything) Y = 0 0 0 0 ( ) 0 0 0 + 0 0 0 0 0 0 0 0 0 0 Hypersurface of degree 7 (perform elimination). (Thanks to Bernd Sturmfels and Zvi Rosen for these examples.) Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 20 / 23

Low rank plus sparse error Consider P {0, 1} k N a zero-pattern, P = i,j P i,j. Error matrix E R k N has zero-pattern P if (approximately) zero entries in E occur at the zeros of P. Write E R{P}. Define X r (P) R k N as the (Zariski closure of) matrices that are rank r plus error with zero-pattern P. X r (P) := { AB + E R k N A R k r, B R r N, E R{P} } Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 21 / 23

Low rank plus sparse error X r (P) := { AB + E R k N A R k r, B R r N, E R{P} } Questions Study the varieties X r (P) and P varies keeping P and r fixed. 1 What are the dimensions of X r (P)? 2 What are the degrees of X r (P)? 3 Can we find nice defining equations for X r (P) R k N? 4 What are sufficient conditions to separate X r (P) from X r (P )? 5 For which k, N, r, P, are these models identifiable? 6 Given that D X r (P), how can we effectively find A, B such that D = AB + E? Questions are most interesting when r < k << N and P small. Can ask the analogous questions for other matrix structures. Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 22 / 23

Wrap-up In multimedia pattern recognition, (and many other fields) data sets grow faster than we can process or make sense of them. Most applications use some sort of optimization methods to solve the computational problem, with many successes, but there is much room for improvement. Let s study the geometric and algebraic structures that naturally arise in these applications, even for the smallest cases. We hope that the new insights we find will shed light and lead to improvements and new developments for computational techniques. Luke Oeding (UC Berkeley) Geometry & Pattern Recognition April 30, 2012 23 / 23