Dr. Marc Pendzich Institute of Musicology University of Hamburg. Dr. Daniel Müllensiefen Goldsmiths College University of London

Similar documents
Melodic Segmentation Across Cultural Traditions

Machine Learning for natural language processing

Dealing with Text Databases

Efficient (δ, γ)-pattern-matching with Don t Cares

Bag of Words Meets Bags of Popcorn

How big is the Milky Way? Introduction. by Toby O'Neil. How big is the Milky Way? about Plus support Plus subscribe to Plus terms of use

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models

MR. DICKSON'S METHOD FOR BAND Book Two

A Neural Passage Model for Ad-hoc Document Retrieval

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson

Stochastic Processes Calcul stochastique. Geneviève Gauthier. HEC Montréal. Stochastic processes. Stochastic processes.

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Word Storms: Multiples of Word Clouds for Visual Comparison of Documents

Deconstructing Data Science

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

Ruslan Salakhutdinov Joint work with Geoff Hinton. University of Toronto, Machine Learning Group

Dimensionality reduction

STA141C: Big Data & High Performance Statistical Computing

Musical Genre Classication

Variable Latent Semantic Indexing

Bayesian harmonic models for musical signal analysis. Simon Godsill and Manuel Davy

N-gram based Text Categorization

Vector Space Scoring Introduction to Information Retrieval Informatics 141 / CS 121 Donald J. Patterson

Online Passive-Aggressive Algorithms. Tirgul 11

I CAN STATEMENTS 6TH GRADE SOCIAL STUDIES

Œ Œ Œ Œ. t bv. Œ bv. ŒŒŒŒ t. & 4 t ŒŒŒŒ bv "B FLAT" 1. Our First Note. 2. Our Second Note. 3. Three's a Crowd. 4. Four In a Row

Why Language Models and Inverse Document Frequency for Information Retrieval?

A Bayesian Model of Diachronic Meaning Change

Information Retrieval

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Latent Semantic Analysis. Hongning Wang

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Language as a Stochastic Process

Information Retrieval Basic IR models. Luca Bondi

A Borda Count for Collective Sentiment Analysis

Classification. Team Ravana. Team Members Cliffton Fernandes Nikhil Keswaney

A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR)

Space: Cross-Curricular Topic : Year 5/6

Towards a Factorization of String-Based Phonology

TDDD43. Information Retrieval. Fang Wei-Kleiner. ADIT/IDA Linköping University. Fang Wei-Kleiner ADIT/IDA LiU TDDD43 Information Retrieval 1

Machine Learning for natural language processing

The Noisy Channel Model and Markov Models

Information Retrieval

Information Retrival Ranking. tf-idf. Eddie Aronovich. October 30, 2012

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

1 Information retrieval fundamentals

Hash-based Indexing: Application, Impact, and Realization Alternatives

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Statistical Methods for NLP

The Benefits of a Model of Annotation

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Ranked Retrieval (2)

Supervised Term Weighting for Automated Text Categorization

Sample Copy. Not For Distribution.

Measuring Topic Quality in Latent Dirichlet Allocation

TOPOS THEORY IN THE FORMULATION OF THEORIES OF PHYSICS

5. DIVIDE AND CONQUER I

Boolean and Vector Space Retrieval Models

Vector Space Models. wine_spectral.r

Statistics, Data Analysis, and Simulation SS 2017

Applied Natural Language Processing

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection

Language Models, Smoothing, and IDF Weighting

Maxent Models and Discriminative Estimation

PV211: Introduction to Information Retrieval

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Efficient Tree-Based Topic Modeling

Written & Illustrated by

LEO FEIST B. FELDMAN & CO

Chap 2: Classical models for information retrieval

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding.

INFO 630 / CS 674 Lecture Notes

5. DIVIDE AND CONQUER I

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

4th Sound and Music Computing Conference, July 2007, Lefkada, Greece

George J. Klir Radim Belohlavek, Martin Trnecka. State University of New York (SUNY) Binghamton, New York 13902, USA

Cognitive semantics and cognitive theories of representation: Session 6: Conceptual spaces

Lesson4: Descriptive Modelling of Similarity of Text Unit3: Vector space models for similarity

Lecture 9: Probabilistic IR The Binary Independence Model and Okapi BM25

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

Prediction of Dissimilarity Judgments between Tonal Sequences using Information Theory

Ranking-II. Temporal Representation and Retrieval Models. Temporal Information Retrieval

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

(Preliminary Version)

Chapter 10: Information Retrieval. See corresponding chapter in Manning&Schütze

Polyphonic Music Modeling with Random Fields

THE WHISPERING MOON WORKSHOP

STA141C: Big Data & High Performance Statistical Computing

Computational Creativity via Human-Level Concept Learning

Maschinelle Sprachverarbeitung

Incorporating features of distribution and progression for automatic makam classification

MATRIX DECOMPOSITION AND LATENT SEMANTIC INDEXING (LSI) Introduction to Information Retrieval CS 150 Donald J. Patterson

Theory of Computer Science

A Formalization of Logical Imaging for IR using QT

Theory of Computer Science. Theory of Computer Science. E1.1 Motivation. E1.2 How to Measure Runtime? E1.3 Decision Problems. E1.

Transcription:

Dr. Marc Pendzich Institute of Musicology University of Hamburg Dr. Daniel Müllensiefen Goldsmiths College University of London

1 Introduction 2 Method 3 Empirical Study 4 Summary/next steps

Melodic Plagiarism: Huge public interest, importance for pop industry little research Exceptions: Stan Soocher: They Fought The Law, 1999 Charles Cronin: Concepts of Melodic Similarity in Music-Copyright, 1998

The aim of the study is to explore how melodic similarity as measured by modern algorithms is related to court decisions in individual cases to measure the similarity of the melody pairs in a sample of cases taken from a collection of court cases and to evaluate the predictive power of the algorithmic measurements when compared to the court ruling.

20 cases spanning the years from 1970 to 2005 with a focus on melodic aspects of music copyright infringement. Creation of monophonic MIDI files, analysis of the written opinions of the judges, reduction of the court decisions to only two categories pro plaintiff = melodic plagiarism contra plaintiff = no infringement

The Chiffons He s So Fine, 1963 No. 1 in US, UK highest position 11 George Harrisson, My Sweet Lord Single published in 1971 No.-1-Hit in US, UK & (West-)Germany

Ronald Selle, Let It End Bee Gees, How Deep Is Your Love (1977)

How do court decision relate to melodic similarity? What is the frame of reference (directionality of comparisons)? How is prior musical knowledge taken into account?

Idea: Frequency of melodic elements important for similarity assessment Inspired from computational linguistics (Baayen, 2001), text retrieval (Manning & Schütze, 1999) Conceptual Components: m-types (aka n-grams) as melodic elements Frequency counts: Type frequency (TF) and Inverted Document Frequency (IDF)

Word Type t Frequency f(t), Melodic Type τ (pitch interval, length 2) Twinkle 2 0, +7 1 little 1 +7, 0 1 star 1 0, +2 1 How 1 +2, 0 1 I 1 0, -2 3 wonder 1-2, -2 1 what 1-2, 0 2 you 1 0, -1 1 are 1-1, 0 1 Frequency f(τ),

C Corpus of melodies m melody τ τ Melodic type T # different melodic types m:τ m # melodies containing τ Melodic Type τ (pitch interval, length 2) 0, +7 1 +7, 0 1 0, +2 1 +2, 0 1 0, -2 3-2, -2 1-2, 0 2 0, -1 1-1, 0 1 TF(m,") = Frequency f(τ) # $ i=1 f m (") f m ( ) " i IDF C " ( ) = log ( C ) m:" # m

C Corpus of melodies m melody τ τ Melodic type T # different melodic types TF(m,") = m:τ m # melodies containing τ # $ i=1 f m (") f m ( ) " i IDF C " ( ) = log ( C ) m:" # m Melodic Type τ (pitch interval, length 2) Frequency f(τ) TF(m, τ) 0, +7 1 0.11 +7, 0 1 0.11 0, +2 1 0.11 +2, 0 1 0.11 0, -2 3 0.33-2, -2 1 0.11-2, 0 2 0.22 0, -1 1 0.11-1, 0 1 0.11

C Corpus of melodies m melody τ τ Melodic type T # different melodic types TF(m,") = m:τ m # melodies containing τ # $ i=1 f m (") f m ( ) " i IDF C " ( ) = log ( C ) m:" # m Melodic Type τ (pitch interval, length 2) Frequency f(τ) TF(m, τ) IDF C (τ) 0, +7 1 0.11 1.57 +7, 0 1 0.11 1.36 0, +2 1 0.11 0.23 +2, 0 1 0.11 0.28 0, -2 3 0.33 0.16-2, -2 1 0.11 0.19-2, 0 2 0.22 0.22 0, -1 1 0.11 0.51-1, 0 1 0.11 0.74

C Corpus of melodies m melody τ τ Melodic type T # different melodic types TF(m,") = m:τ m # melodies containing τ # $ i=1 f m (") f m ( ) " i IDF C " ( ) = log ( C ) m:" # m Melodic Type τ (pitch interval, length 2) Frequency f(τ) TF(m, τ) IDF C (τ) TFIDF m,c (τ) 0, +7 1 0.11 1.57 0.1727 +7, 0 1 0.11 1.36 0.1496 0, +2 1 0.11 0.23 0.0253 +2, 0 1 0.11 0.28 0.0308 0, -2 3 0.33 0.16 0.0528-2, -2 1 0.11 0.19 0.0209-2, 0 2 0.22 0.22 0.0484 0, -1 1 0.11 0.51 0.0561-1, 0 1 0.11 0.74 0.0814

" C (s,t) = ' # % sn& t n TFIDF s, C (# )$TFIDF t, C (# ) ' # % sn& t n ' # % sn& t n ( TFIDF s, C (# ) 2 $ TFIDF t, C (# ) ( ) 2

Ratio Model (Tversky, 1977): Similarity σ(s,t) related to # features in s and t have common salience of features f() "(s,t) = f (s n # t n ) f (s n # t n )+$f (s n \ t n )+ %f (t n \ s n ),$,% & 0 features => m-types salience => IDF and TF different values of α, β to change frame of reference Variable m-type lengths (n=1,,4), entropy-weighted average

Tversky.equal measure (with α = β = 1) "(s,t) = & # $ sn %t n IDF C (# ) & IDF C (# ) + & IDF C (# ) + IDF C (# ) # $ s n %t n # $ s n \ t n & # $ tn \ s n Tversky.plaintiff.only measure (with α = 1, β = 0) " plaintiff.only (s,t) = & # $ sn %t n IDF C (# ) & IDF C (# ) + IDF C (# ) # $ s n %t n & # $ sn \ t n Tversky.defendant.only measure (with α = 0, β = 1) " defendant.only (t,s) = & # $ sn %t n IDF C (# ) & IDF C (# ) + IDF C (# ) # $ s n %t n & # $ tn \ s n Tversky.weighted measure with " = & TF s (# ) # $ sn %t n TF s (# ) & # $ sn and " = & TF t (# ) # $ sn %t n TF t (# ) & # $ tn

Ground Truth: 20 cases with yes/no decision (7/13) Evaluation metrics Accuracy (% correct at optimal cut-off on similarity scale) AUC (Area Under receiver operating characteristic Curve)

Ronald Selle, Let It End Bee Gees, How Deep Is Your Love Observations: Decision sometimes based on characteristic motives High-level form can be important (e.g. call-and-response structure) Reference point can be different

Court decisions can be related closely to melodic similarity Plaintiff s song is often frame of reference Statistical information about commonness of melodic elements is important

More US cases UK and German cases (from the big western markets) Include rhythm in m-types Compare to more similarity algos from literature

Thank you for your attention! Dr. Marc Pendzich Institute of Musicology University of Hamburg Dr. Daniel Müllensiefen Goldsmiths College University of London