Modeling Music as Markov Chains: Composer Identification

Similar documents
Biology as Information Dynamics

Biology as Information Dynamics

Introduction. Musical Features. Raw Data Extraction. Anchored vs. Unanchored Searches. Example Feature Searches

Lecture 5: Introduction to Markov Chains

Chapter 35 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.

Chapter 10 Markov Chains and Transition Matrices

5 Mutual Information and Channel Capacity

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

MATH3200, Lecture 31: Applications of Eigenvectors. Markov Chains and Chemical Reaction Systems

Hypothesis Tests Solutions COR1-GB.1305 Statistics and Data Analysis

Computational Tasks and Models

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Unsupervised Learning with Permuted Data

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

Dynamic Programming Lecture #4

Lecture 3: Lower Bounds for Bandit Algorithms

Notes from Week 9: Multi-Armed Bandit Problems II. 1 Information-theoretic lower bounds for multiarmed

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Quiz 1 Date: Monday, October 17, 2016

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Probability Theory for Machine Learning. Chris Cremer September 2015

Homework set 2 - Solutions

Grade 7/8 Math Circles November 27 & 28 &

arxiv: v1 [math.ho] 18 Aug 2015

Elementary Linear Algebra, Second Edition, by Spence, Insel, and Friedberg. ISBN Pearson Education, Inc., Upper Saddle River, NJ.

1 Probability Theory. 1.1 Introduction

Lecture 35: December The fundamental statistical distances

Lecture 20 : Markov Chains

Math , Fall 2012: HW 5 Solutions

Know your major scales

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Stochastic Processes

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Probability the chance that an uncertain event will occur (always between 0 and 1)

What are the Findings?

Agile Mind Grade 7 Scope and Sequence, Common Core State Standards for Mathematics

Quantitative Biology II Lecture 4: Variational Methods

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

A Unifying View of Image Similarity

Markov Chains and Related Matters

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Doing Physics with Random Numbers

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Statistical testing. Samantha Kleinberg. October 20, 2009

Introduction to Machine Learning

CS 361: Probability & Statistics

Algorithms other than SGD. CS6787 Lecture 10 Fall 2017

Determining Feature Relevance in Subject Responses to Musical Stimuli

Problem Set #2 Due: 1:00pm on Monday, Oct 15th

Introduction to Machine Learning

Lecture 21: Decay of Resonances

The Absolute Value Equation

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Hands-On Learning Theory Fall 2016, Lecture 3

Lecture #5. Dependencies along the genome

Lecture 3: Constructing a Quantum Model

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Probabilistic Reasoning

Example. If 4 tickets are drawn with replacement from ,

1.4 Mathematical Equivalence

Introduction to Game Theory

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Roberto s Notes on Linear Algebra Chapter 9: Orthogonality Section 2. Orthogonal matrices

Math 350: An exploration of HMMs through doodles.

ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

1 Review of The Learning Setting

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE

Probability Models of Information Exchange on Networks Lecture 5

Homework (due Wed, Oct 27) Chapter 7: #17, 27, 28 Announcements: Midterm exams keys on web. (For a few hours the answer to MC#1 was incorrect on

Basic matrix operations

Statistical Data Analysis Stat 3: p-values, parameter estimation

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Probability and Statistics

STAT 285 Fall Assignment 1 Solutions

CHAPTER 3 PROBABILITY TOPICS

MS&E 226: Small Data

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

1.3 Convergence of Regular Markov Chains

HIDDEN MARKOV MODELS IN INDEX RETURNS

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Bayesian Statistics

MAT Mathematics in Today's World

Basic Probability. Introduction

Markov Chains Handout for Stat 110

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Distance between physical theories based on information theory

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

The Methods of Science

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Statistics for Business and Economics

Reading for Lecture 6 Release v10

Transcription:

Modeling Music as Markov Chains: Composer Identification Music 254 Final Report Yi-Wen Liu (acobliu@ccrma.stanford.edu) Professor Eleanor Selfridge-Field TA: Craig Sapp 11 June 2002 (Edt. 20 October 2006) Abstract In this research, it is proposed that music can be thought of as random processes, and music style recognition can be thought of as a system identification problem. Then, a general framework for modeling music using Markov chains is described, and based on this framework, a two-way composer identification scheme is demonstrated. The scheme utilizes the Kullback-Leibler distance as the metric between distribution functions, and it is shown that under the condition when the marginals are identical, the scheme gives maximum likelihood identification. Experiments of composer identification are conducted on all the string quartets written by Mozart and Haydn, and the results are documented and discussed. 1 Motivation The goal of this research is to develop a model that enables a computer to symbolically examine a piece of classical music which is never exposed to it, and tell who is the composer. As a classical music fan myself, I observed that when we listen to unknown music, we almost always like to classify the style of the music. Actually, we even make guesses on who is the composer, and a person with higher accuracy is often considered as a better listener. While in this research, it is avoided to address on the controversy of quantifying how well a person listens to music, I am still curious if music appreciation, and composer identification in particular, can be statistically learned by a computer. I claim that music can be dealt with as random processes. The claim is based on the fact that music consists of time-sequences of musical events, such as notes, chords, dynamics, rhythmic patterns, etc. It is already known that other types of time-sequences, such as stock prices, are quite successfully modeled by random processes. Therefore, I claim that the process of musical composition can be modeled as a realization of an underlying random process, and 1

2 the underlying random process is what we fuzzily call musical styles. If we further hypothesize that each composer has his/her unique style, then the composer identification problem becomes what is known by engineers as a system identification problem. Among the many system identification techniques that already exist, this research explores the Markov chain model due to its simplicity. Due to the simple assumption that the future is conditionally independent to the past given the present, a Markov chain can be characterized by its state-transition probability matrix. However, we have to point out that the assumption is ignorant of macroscopic musical structures, and it is therefore valid to question the validity of Markov modeling of music in general. Although, in this research, I am awared of the general validity issue, I remain optimistic that Markov models may be sufficiently sophisticated to teach a computer to appreciate music, as far as composer identification is concerned. The rest of the paper is organized as follows. Section 2 describes the general methods. Section 3 documents the experiments that have been done so far. Results are discussed in Section 4, and possible future directions are pointed out in Section 5. 2 Methods To describe how music is modeled by Markov chains, let s first define the terminologies and notations A first order, discrete time Markov chain C is a random walk X t, t = 1, 2, 3,..., in a state space S = {s 1, s 2,.., s N } according to a N N state-transition matrix P i, = p (X t+1 = s X t = s i ), where P i, denotes the element on the i th row and th column, and p( ) is the usual notation of the conditional probability distribution function. Mathematically, it suffices to say that a Markov chain C is characterized by its state-transition matrix P, up to one-to-one mappings between homeomorphic state spaces. One can even sloppily write C = P. However, how the transition matrix actually means depends upon how the state-space is defined. For example, if the (pentatonic) state space is defined as S = {C4, D4, E4, A4, G4}, then P 1,5 = 0.3 reads the next note is G4 thirty percents of the times when the current note is C4. Since higher-order Markov chains are beyond the scope of this research, in the rest of this paper, Markov chains means first-order ones unless otherwise mentioned. 2.1 Computational models of music styles The following steps describe how to build up a Markov chain model for a music style. Step 1: Define the repertoire of that style.

3 Step 2: Encode all the works of that repertoire. Step 3: Define the state space S that consists of musical events. Step 4: From all the encoded works available in that repertoire, calculate the matrix P of conditional event-transition probabilities. Step 5: Let C be the Markov chain corresponding to {P, S}. Markov chain model for that particular style of music. Call C a Step 1 can be a pain, since musical styles are often of a vague notion to human beings. Is Beethoven s Moonlight Sonata classical or romantic? Here, it is supposed that the human researcher somehow clearly define the repertoire, anyway. Step 2 is not quite straightforward, either. It all depends on what musical features are relevant. Data entry is another practical difficulty in this step, because it takes forever if the repertoire is huge. Step 3 involves defining an interesting prominent feature associated with that style, and quantifying that feature in a clever way. The state space consists of all the possible values to which the feature is quantified. In step 4, the transition matrix is obtained by normalizing the histogram of state-transition instances in all the works of that repertoire. 2.2 Composer identification by comparing Kullback-Leibler distances Assuming that composer A and composer B each has a unique style that can be recognized in a certain repertoire, one can select a feature that looks interesting, define the state space, and build up the Markov chain models C A = P A, C B = P B, for composer A and B, respectively. Based on these two Markov models, for each given unknown work U, two-way composer identification can be achieved by the following hypothesis test, { A, D(P U P A ) < D(P U P B ); My Author = B, otherwise, where P U is the transition matrix of the repertoire that contains only the unknown work U, My Author is the composer that is more likely than the other one to have written the work, and D is a distance metric on the space of N N probability matrice. In this research, we use the Kullback-Leibler distance metric, D(P (1) P (2) ) = N N i=1 =1 P (1) i, log P (1) i, 2 P (2) i, Although the Kullback-Leibler distance metric is not symmetric and does not obey the trangular inequality, it is useful to interpret of the metric as the distance between distributions [1]. Accepting this interpretation, the above

4 hypothesis test reads if the transition matrix of the unknown work is closer to that of composer A s repertoire, then it is more likely to have been written by A. This hand-waving statement can be ustified by the following theorem. THEOREM (likelihood ratio interpretation) For the two-way composer identification test, if the two marginal distributions are identical, P A i = P A i, = P B i, = P B i, i, then, the difference of the two distances has a likelihood ratio interpretation, 2 D(P U P A ) D(P U P B) = ( ) 1/L p(u B) p(u A) where p(u A) is the a priori probability that U= {U 1, U 2,...} is generated according to P A, and L is the length of U. Proof: D(P U P A ) D(P U P B ) = i = i = i = 1 L i Pi, U Pi, U log 2 Pi, A i P U i, log 2 P B i, P A i, n i, L log Pi, B 2 log 2 ( P B i, P A i, Pi, A ) ni, P U i, log 2 P U i, P B i, where n i, denotes how many times the transition between i th and th states occurs in U. Therefore, 2 D(P U P A ) D(P U P B) = ( L 1 P B U t,u t+1 P A t=1 U t,u t+1 ( P B L 1 = U1 PU A 1 t=1 = ) 1/L (1) P B U t,u t+1 /P B U t P A U t,u t+1 /P A U t ) 1/L (identical marginals) (2) ( ) 1/L p(u B) (3) p(u A) The above theorem states that, for the purposes of composer identification, if the marginals of are identical and hence don t help, then

5 it is necessary to apply Markov modeling; the above mentioned hypothesis test is a maximum likelihood test. For example, if two composers use as frequently each of the notes as one another but concatenate them differently, then Kullback-Leibler distance is the right metric to choose for the purposes of composer identification. 3 Experiments and discussion 3.1 String quartets: Mozart vs. Haydn Markov chain models are built up for Mozart s and Haydn s scale degree class transitions. The repertoire consists of all of their string quartets, 100 movements from Mozart and 212 movements from Haydn. The scale degree is defined relative to the tonics, i.e., all works are transposed into the same key. Also, for the sake of reducing the dimensionality of transition matrice, the system does not distinguish between the same degree class at different octaves. The system does not distinguish between maor keys and minor keys, either. Then, the twoway identification tests based on Kullback-Leibler distances were conducted for each of the 4 voices. The tests were conducted in a bootstrap manner. In other words, it is always assumed that the computer has been exposed to the all the works in the repertoire except the one to be identified. Part Mozart Haydn ViolinI 68.0% 64.2% ViolinII 58.0% 64.2% Viola 61.0% 53.8% Cello 57.0% 52.8% Table 1: Mozart vs. Haydn identification tests. Table 1 shows the result of two-way identification tests. The boldfont indicates data that are statistically significant. To qualify for statistical significance, a recognition rate has to be higher than µ + 2σ of random flips of fair coins, where µ is the mean, 50%; and σ is the standard deviation, which is inversely proportional to the square root of the number of flips, according to the central limit theorem. As a controlled experiment, repertoires are defined as two randomly assigned, and mutually exclusive lists of quartets. Then, the two-way style identification tests are conducted, and the result is shown in Table 3.1. Each of the data shows an average of ten runs. Human listening tests are conducted via web survey. A user is played, the MIDI piano version of, a quartet from either composer by equal chance, and is asked to identify the composer. As of June 11, 2002, web users average 59.0% of accuracy in 1865 attempts. How experienced the users are with the repertoire

6 Part Random A Random B ViolinI 42.6% 56.1% ViolinII 44.5% 53.4% Viola 47.4% 51.9% Cello 43.1% 53.4% Table 2: Random repertoires identification tests. ranges from novices of classical music to string players that have played some of the works. 4 Future directions Tests on other features, such as intervallic values, rhythmic patterns, harmonic rhythms, chord progression, etc. Combination of opinions from multiple agents. Tests on other repertoires that are similar to human beings, such as Schumann s vs. Schubert s lieders, J.S. Bach s vs. Handel s choral music, etc. 5 Conclusion We proposed a scheme for modeling music using Markov chains and applied it to two-way composer identification. Experiments were conducted by examining the scale degree transition of Haydn s and Mozart s string quartets, and the results showed that, as far as the recognition rate is concerned, the computer s performance can be statistically significant. References [1] T. Cover and J. Thomas, Elements of information theory, Wiley Interscience, New York (1991)