Visual matching: distance measures
|
|
- Lorena Lynch
- 6 years ago
- Views:
Transcription
1 Visual matching: distance measures Metric and non-metric distances: what distance to use It is generally assumed that visual data may be thought of as vectors (e.g. histograms) that can be compared for similarity using the Euclidean distance or more generally metric distances: Given a S set of paaerns a distance d: S S R is metric if sabsfying: Self idenbty: x S, d(x,x) = 0 PosiBvity: x y S, d(x,y) > 0 Symmetry: x,y S, d(x,y) = d(y,x) Triangle inequality: x,y,z S, d(x,z) d(x,y) + d(y,z) However, this may not be a valid assumpbon. A number of approaches in computer vision compare images using measures of similarity that are not Euclidean nor even metric, in that they do not obey the triangle inequality or simmetry. 1
2 Most notable cases where non metric distances are suited are: RecogniBon systems that aaempt to faithfully reflect human judgments of similarity. Much research in psychology suggests that human similarity judgments are not metric and distances are not symmetric. Matching of subsets of the images while ignoring the most dissimilar parts. In this case non metric distances are less affected by extreme differences than the Euclidean distance, and more robust to outliers. Distance funcbons that are robust to outliers or to extremely noisy data will typically violate the triangle inequality. Comparison between data that are output of a complex algorithm, like image comparisons using deformable template matching scheme, has no obvious way of ensuring that the triangle inequality holds. Histogram based representabons Feature vectors are o\en in the form of histograms that collect the distribubon of salient features. Several distances can be defined between histograms. Choice is dependent on the goals of matching and type of histogram. If human perceptual similarity is accounted, non metric distances are preferred. Grey level, color histograms are the most frequently used Color histogram 2
3 Human judgements of similarity Symmetry does not always hold for human percepbon: d(a,b) < d(b,a) A B Part based representabons ParBal similarity is a non metric relabon: d(a,b) + d(b,c) < d(a,c) B A Am I human? Yes, I m parbally human. C I am a centaur. Am I equine? Yes, I m parbally equine. 3
4 Similarity from complex algorithms Shape deformabon similarity is non metric: similarity can be assessed by minimizing the energy of deformabon E spent while maximizing matching M between edges Metric distances Feature maching where image data are represented by vector data are well suited to work with metric distances. Many metric distance measures are possible. Among them: HeurisBc Minkowski form Geometric Cosine distance Working with distribubons (histograms) Euclidean L1 Hamming Weighted Mean Variance (WMV) 4
5 Minkowski distance L p metrics also called Minkowski distance defined for two feature vectors A = (x 1,,x n ) and B = (y 1,,y n ) : ( )1 n p ( ) i=1 d p ( A, B) = x i y i p L 1 : City Block or Manha:an d 1 = A-B = L 2 : Euclidean distance n ( ) d 2 = x i y i i=1 n x i y i i=1 2 L : max, Chess board distance d = max i x i y i The L 1 norm and the L 2 norm are mostly used because of their low computabonal cost. A circle is a set of points at a fixed distance from the center equal to a radius r. In ManhaAan geometry, distance is determined differently than in Euclidean geometry and the shape of circles changes into squares with sides at 45 to the coordinate axes. 5
6 Color histograms: L1 and Euclidean distance If comparing two color histograms with ManhaAan or Euclidean distance take care of: the L 1 and Euclidean distances result in many false negabves because neighboring bins are not considered Euclidean distance is only suited for Lab and Luv color spaces. L 1 and Euclidean distance d(h 1,H 2 ) Color histograms: Hamming distance Hamming distance hypothesizes that histograms are binary vectors. Can detect absence/presence of colors: n n d h (I q,i d ) = H j (I q ),H j (I d ) = (H j (I q ) XOR H j (I d )) j=1 j=1 Hamming distance d(h 1,H 2 )
7 Cosine Distance Cosine distance derives from the definibon of dot product between two vectors. Geometrically, dot product means that a and b are drawn with a common start point and then the length of a is mulbplied with the length of that component of b that points in the same direcbon as a. Cosine distance measures how much a is not aligned with b m x i y i d(a,b) = 1 cos(a,b) = 1 A B = i=1 A B m m Π( x 2 i y 2 i ) i = 1 i = 1 ProperBes: Metric F 1 = 2x 1 + 3x 2 + 5x 3 Only angle is relevant, not vector lengths x 3 Example: F 1 = 2x 1 + 3x 2 + 5x 3 F 2 = 3x 1 + 7x 2 + x 3 Q = 0x 1 + 0x 2 + 2x 3 Q is closer to F 1 than F 2 F 2 = 3x 1 + 7x 2 + x 3 x 2 Q = 0x 1 + 0x 2 + 2x 3 x 1 Weighted Mean Variance Weighted Mean Variance (WMV) distance includes some minimal informabon about the data distribubon: D r (I,J) = µ r (I) µ r (J) σ (µ r ) + σ r (I) σ r (J) σ (σ r ) WMV is parbcularly quick because the calculabon is quick and the values can be precomputed offline. 7
8 Non metric distances With vector data : HeurisBc Minkowski form p<1 Mahalanobis Working with distribubons (histograms) Nonparametric test stabsbcs Kolmogorov Smirnov (KS) Cramer/Von Mises (CvM) χ 2 (Chi Square) Ground distance measures Histogram intersecbon QuadraBc form (QF) Earth Movers Distance (EMD) InformaBon theory divergences Kullback Liebler (KL) Jeffrey divergence (JD) Effects of variance and covariance on Euclidean distance B A The ellipse shows the 50% contour of a hypothebcal populabon. Euclidean distance is not suited to account for differences in variance between the variables and to account for correlabons between variables. Points A and B have similar Euclidean distances from the mean, but point B is more different from the populabon than point A. This is parbcularly cribcal for effects connected to human percepbon in low level feature image matching. In this case the Malahanobis distance should be used. 8
9 Mahalanobis (QuadraBc) Distance QuadraBc Form distance accounts for correlabon between features : d 2 ( A, B) = m i=1 m x i y i w ij x j y j = A B T W A B j=1 where W is the covariance matrix and diagonal terms are variance in each dimension and off diagonal terms indicate the dependency between variables. ProperBes: Metric only if w ij = w ji and w ii =1 Non metric otherwise Geometric interpretabon of metric distances 9
10 Color histograms: Mahanalobis distance Mahanalobis distance is used for color histogram similarity as it closely resembles human percepbon: d h (A,B) = (H(I q ) H(I d )) T A (H(I q ) H(I d )) [ ] A = a ij being the similarity matrix denobng similarity between bins i and j of N feature vectors x 1,,x N, each of length n. The quadrabc form distance in image retrieval results in false posibves because it tends to overesbmate the mutual similarity of color distribubons without a pronounced mode: the same mass in a given bin of the first histogram is simultaneously made to correspond to masses contained in different bins of the other histogram Histogram intersecbon Histogram intersecbon helps to check occurrence of object in region H obj [j] < H reg [j]. Histogram intersecbon is not symmetric: n j=1 d h (I q,i d ) = 1 min(h j (I q ),H j (I d )) n j=1 H j (I d ) Histogram intersecmon is widely used because of its ability to handle parbal matches when the areas of the two histograms are different. 10
11 CumulaBve Difference distances Kolmogorov Smirnov distance (KS) D r (I,J) = max F r (i;i) F r (i;j) Cramer/von Mises distance (CvM) D r (I,J) = (F r (i;i) F r (i;j)) 2 i where F r (I;.) is the marginal histogram distribubon Both Kolmogorov Smirnov and Cramer/von Mises distance are stabsbcal measures that measure the underlying similarity of two unbinned distribubons. Work only for 1D data or cumulabve histograms. They are non symmetric distance funcbons. CumulaBve Histogram CumulaBve Histogram describes the probability that a random variable X with a certain pdf will be found at a value less than or equal to x. Normal Histogram CumulaBve Histogram 11
12 CumulaBve Difference Example Histogram 1 Histogram 2 Difference - = K-S = D r (I, J ) = max F r (i; I) F r (i; J ) CvM = D r (I,J) = (F r (i;i) F r (i;j)) 2 i χ 2 distance χ 2 distance measures the underlying similarity of two samples where differences are emphasized: ( ) = D I, J ( f( i;i ) ˆ f ( i) ) 2, ˆ f i ˆ f ( i) i ( ) = [ f( i;i ) + f( i; J) ] / 2 is the expected frequency χ 2 distance measures how unlikely it is that one distribubon was drawn from the populabon represented by the other. The major drawback of these measures is that it accounts only for the correspondence between bins with the same index and do not uses informabon across bins 12
13 Earth Mover s distance Earth Mover s distance (EMD) between two distribubons x and y represents the minimum work to morph one distribubon into the other. Informally the two distribubons represent different ways of amassing the same amount of material from a region D and the EMD is given by the amount of mass Bmes the distance by which it is moved. Region D f ij amount of mass from x i to y j d ij distance from x i to y j EMD = 0.23* * *316.3 = EMD opt = 0.23* * * *277 = Given feature vectors with their associated feature weights A = {(x i,w i )} and B = {(y j,u j )} and a funcbon f ij expressing the capability of flowing from x i to y j over a distance d ij : provided that: f ij 0, Σ j f ij w i, Σ i f ij u i, Σ ij f ij = min(w,u) min U f ij d ij i,j d h (A,B) = min (W, U) ProperBes Respects scaling Metric if d metric, and W = U If W U: No posibvity, surplus not taken into account, No triangle inequality It is the only measure that works on distribubons with a different number of bins. Widely used for color, edge, mobon vector histograms but has high computabonal cost. 13
14 Moving Earth with histograms Considering two histograms H 1 and H 2 as defined f.e. in a color space, pixels can be regarded as the unit of mass to be transported from one distribubon to the other. It has to be based on some metric of distance between individual features. Moving Earth with histograms 14
15 Moving Earth with histograms = CompuBng Moving Earth Distance (amount moved) = 15
16 CompuBng Moving Earth Distance (amount moved) * (distance moved) = With variable length representabons P (distance moved) * (amount moved) m clusters m i=1 n j=1 f ij d ij = work Q n clusters 16
17 Constraints 1. Move earth only from P to Q P m clusters P Q n clusters Q Constraints 2. Cannot send more earth than there is P m clusters P Q n clusters Q 17
18 Constraints 3. Q cannot receive more earth than it can hold P m clusters P Q n clusters Q Constraints 4. As much earth as possible must be moved P m clusters P Q n clusters Q 18
19 Kullback Leibler distance Kullback Leibler distance considers histograms as distribubons and measures their similarity by calculabng the relabve entropy. It measures the shared informabon between two variables.: i.e. the cost of encoding one distribubon as another. In other words it measures how well can one distribubon be coded using the other as a codebook Σ i H i [I q ] = Σ i H i [I d ] =1 H i [I q ], H i [I d ] 0 n d h (I q,i d ) = H i (I q )log H i(i q ) H i (I d ) i=1 The Kullback Leibler divergence is not symmetric. It can be used to determine how far away a probability distribubon P is from another distribubon Q i.e. as a distance measure between two documents. The Kullback Leibler divergence does not necessarily match perceptual similarity well and is sensibve to histogram binning. Jeffrey divergence Jeffrey divergence n d h (I q, I d ) = H i (I q )log H (I ) i q i=1 H i (I d ) + H (I )log H (I ) i q i q H i (I d ) The divergence is an empirical modificabon of the KL divergence that is numerically stable, symmetric and robust with respect to noise and the size of histogram bins 19
20 Distance properbes summary /- /- /- /- by Kein FolienBtel L p WMV χ 2 KS CvM KL JD QF EMD Minkowski form Weighted Mean Variance Chi Square Kolmogorov Smirnov Cramer/von Mises Kullback Liebler Jeffrey divergence QuadraBc form Earth Movers Distance Examples using Color CIE Lab L1 distance Using Color CIE Lab) Jeffrey divergence χ 2 stabsbcs QuadraBc form distance Earth Mover Distance 20
21 Image Lookup Merging SimilariBes In the case in which several features are considered, distances computed between each feature vector can be merged together to evauate the full similarity. CombinaBon of distances can be performed according to different policies: Linear weighbng: combine k different feature distances d i, e.g. color, texture and shape distances: Linear weighbng (weighted average) Non linear weighbng: α trimmed mean: weight only α percent highest of the k values 21
22 Distances for symbolic representabons In some cases features are represented as strings of symbols. This is the case of spabal relabons, temporal features, semanbc content.. In these cases edit distances can be used that compute the number of changes required to transform one string into the other: Edit distance operabons that are considered are: InserMon, where an extra character is inserted into the string DeleMon, where a character has been removed from the string TransposiMon, in which two characters are reversed in their sequence SubsMtuMon, which is an inserbon followed by a delebon Hamming and Levenshtein distances The Hamming distance (seen for histograms) is suited to compute edit distances between binary vectors. the Needleman Wunch distance (specializabon of Levenshtein edit distance) between components of the feature vectors: A B N W Distance: 6 = (4 2) + (4 2) + (4 2) 22
Multimedia Retrieval Distance. Egon L. van den Broek
Multimedia Retrieval 2018-1019 Distance Egon L. van den Broek 1 The project: Two perspectives Man Machine or? Objective Subjective 2 The default Default: distance = Euclidean distance This is how it is
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Similarity and Dissimilarity Similarity Numerical measure of how alike two data objects are. Is higher
More informationImage Characteristics
1 Image Characteristics Image Mean I I av = i i j I( i, j 1 j) I I NEW (x,y)=i(x,y)-b x x Changing the image mean Image Contrast The contrast definition of the entire image is ambiguous In general it is
More informationMSCBD 5002/IT5210: Knowledge Discovery and Data Minig
MSCBD 5002/IT5210: Knowledge Discovery and Data Minig Instructor: Lei Chen Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline Kamber, and Jian Pei and
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 1
Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects
More information6 Distances. 6.1 Metrics. 6.2 Distances L p Distances
6 Distances We have mainly been focusing on similarities so far, since it is easiest to explain locality sensitive hashing that way, and in particular the Jaccard similarity is easy to define in regards
More informationComputaBonal Physics. StaBsBcal Data Analysis - Fundamental Concepts. Korea University Eunil Won
ComputaBonal Physics StaBsBcal Data Analysis - Fundamental Concepts Korea University Eunil Won Before I start We are entering into scienbfic compubng era and in general this is a huge area. So I would
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More information7 Distances. 7.1 Metrics. 7.2 Distances L p Distances
7 Distances We have mainly been focusing on similarities so far, since it is easiest to explain locality sensitive hashing that way, and in particular the Jaccard similarity is easy to define in regards
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationNotion of Distance. Metric Distance Binary Vector Distances Tangent Distance
Notion of Distance Metric Distance Binary Vector Distances Tangent Distance Distance Measures Many pattern recognition/data mining techniques are based on similarity measures between objects e.g., nearest-neighbor
More informationDistances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Distances and similarities Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Similarities Start with X which we assume is centered and standardized. The PCA loadings were
More informationMeasurement and Data
Measurement and Data Data describes the real world Data maps entities in the domain of interest to symbolic representation by means of a measurement procedure Numerical relationships between variables
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 1 EvaluaBon
More informationDissimilarity, Quasimetric, Metric
Dissimilarity, Quasimetric, Metric Ehtibar N. Dzhafarov Purdue University and Swedish Collegium for Advanced Study Abstract Dissimilarity is a function that assigns to every pair of stimuli a nonnegative
More informationQuantum Mechanics I. Physics Prof. Claudia Ra6 Lecture Notes 1 (based on CT, SecBon 1)
Quantum Mechanics I Physics 6315 Prof. Claudia Ra6 cra6@uh.edu Lecture Notes 1 (based on CT, SecBon 1) Introduction Classical Mechanics, first quanbfied by Newton, describes and explains the mobon of macroscopic
More informationShape of Gaussians as Feature Descriptors
Shape of Gaussians as Feature Descriptors Liyu Gong, Tianjiang Wang and Fang Liu Intelligent and Distributed Computing Lab, School of Computer Science and Technology Huazhong University of Science and
More informationHypothesis testing:power, test statistic CMS:
Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this
More informationProbabilistic Methods in Bioinformatics. Pabitra Mitra
Probabilistic Methods in Bioinformatics Pabitra Mitra pabitra@cse.iitkgp.ernet.in Probability in Bioinformatics Classification Categorize a new object into a known class Supervised learning/predictive
More informationSimilarity and Dissimilarity
1//015 Similarity and Dissimilarity COMP 465 Data Mining Similarity of Data Data Preprocessing Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed.
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 1 EvaluaBon
More informationB490 Mining the Big Data
B490 Mining the Big Data 1 Finding Similar Items Qin Zhang 1-1 Motivations Finding similar documents/webpages/images (Approximate) mirror sites. Application: Don t want to show both when Google. 2-1 Motivations
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More informationsoftened probabilistic
Justin Solomon MIT Understand geometry from a softened probabilistic standpoint. Somewhere over here. Exactly here. One of these two places. Query 1 2 Which is closer, 1 or 2? Query 1 2 Which is closer,
More informationMeasurement and Data. Topics: Types of Data Distance Measurement Data Transformation Forms of Data Data Quality
Measurement and Data Topics: Types of Data Distance Measurement Data Transformation Forms of Data Data Quality Importance of Measurement Aim of mining structured data is to discover relationships that
More informationproximity similarity dissimilarity distance Proximity Measures:
Similarity Measures Similarity and dissimilarity are important because they are used by a number of data mining techniques, such as clustering nearest neighbor classification and anomaly detection. The
More informationAn Analytic Distance Metric for Gaussian Mixture Models with Application in Image Retrieval
An Analytic Distance Metric for Gaussian Mixture Models with Application in Image Retrieval G. Sfikas, C. Constantinopoulos *, A. Likas, and N.P. Galatsanos Department of Computer Science, University of
More informationHigh Dimensional Search Min- Hashing Locality Sensi6ve Hashing
High Dimensional Search Min- Hashing Locality Sensi6ve Hashing Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata September 8 and 11, 2014 High Support Rules vs Correla6on of
More informationCITS 4402 Computer Vision
CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images
More informationAdvanced topics in RSA. Nikolaus Kriegeskorte MRC Cognition and Brain Sciences Unit Cambridge, UK
Advanced topics in RSA Nikolaus Kriegeskorte MRC Cognition and Brain Sciences Unit Cambridge, UK Advanced topics menu How does the noise ceiling work? Why is Kendall s tau a often needed to compare RDMs?
More informationApplications of Information Geometry to Hypothesis Testing and Signal Detection
CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry
More informationDistances & Similarities
Introduction to Data Mining Distances & Similarities CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Distances & Similarities Yale - Fall 2016 1 / 22 Outline
More informationVisit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM INC. 2NY3
Visit The Chemical Sta/s/cian! h2p://chemicalsta/s/cian.wordpress.com/ PREDICTUM IC. WWW.PREDICTUM.COM 23 Follow Me On Twi2er @chemstateric! PREDICTUM IC. WWW.PREDICTUM.COM 23 I am a new guest blogger
More informationStatistical Pattern Recognition
Statistical Pattern Recognition A Brief Mathematical Review Hamid R. Rabiee Jafar Muhammadi, Ali Jalali, Alireza Ghasemi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Probability theory
More informationCorners, Blobs & Descriptors. With slides from S. Lazebnik & S. Seitz, D. Lowe, A. Efros
Corners, Blobs & Descriptors With slides from S. Lazebnik & S. Seitz, D. Lowe, A. Efros Motivation: Build a Panorama M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 How do we build panorama?
More informationData Mining 4. Cluster Analysis
Data Mining 4. Cluster Analysis 4.2 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 10 What is Data? Collection of data objects
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationRiemannian Metric Learning for Symmetric Positive Definite Matrices
CMSC 88J: Linear Subspaces and Manifolds for Computer Vision and Machine Learning Riemannian Metric Learning for Symmetric Positive Definite Matrices Raviteja Vemulapalli Guide: Professor David W. Jacobs
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationSensitivity of the Elucidative Fusion System to the Choice of the Underlying Similarity Metric
Sensitivity of the Elucidative Fusion System to the Choice of the Underlying Similarity Metric Belur V. Dasarathy Dynetics, Inc., P. O. Box 5500 Huntsville, AL. 35814-5500, USA Belur.d@dynetics.com Abstract
More informationCOMPSCI 650 Applied Information Theory Jan 21, Lecture 2
COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.
More informationMid Term-1 : Practice problems
Mid Term-1 : Practice problems These problems are meant only to provide practice; they do not necessarily reflect the difficulty level of the problems in the exam. The actual exam problems are likely to
More informationInteraction Analysis of Spatial Point Patterns
Interaction Analysis of Spatial Point Patterns Geog 2C Introduction to Spatial Data Analysis Phaedon C Kyriakidis wwwgeogucsbedu/ phaedon Department of Geography University of California Santa Barbara
More informationIntensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis
Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 5 Topic Overview 1) Introduction/Unvariate Statistics 2) Bootstrapping/Monte Carlo Simulation/Kernel
More informationMetric spaces and continuity
M303 Further pure mathematics Metric spaces and continuity This publication forms part of an Open University module. Details of this and other Open University modules can be obtained from the Student Registration
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationClustering Lecture 1: Basics. Jing Gao SUNY Buffalo
Clustering Lecture 1: Basics Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics Clustering
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, ovember 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.12.13
More informationLecture 8: Interest Point Detection. Saad J Bedros
#1 Lecture 8: Interest Point Detection Saad J Bedros sbedros@umn.edu Review of Edge Detectors #2 Today s Lecture Interest Points Detection What do we mean with Interest Point Detection in an Image Goal:
More informationDistance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures
Distance Measures Objectives: Discuss Distance Measures Illustrate Distance Measures Quantifying Data Similarity Multivariate Analyses Re-map the data from Real World Space to Multi-variate Space Distance
More informationArrangements, matroids and codes
Arrangements, matroids and codes first lecture Ruud Pellikaan joint work with Relinde Jurrius ACAGM summer school Leuven Belgium, 18 July 2011 References 2/43 1. Codes, arrangements and matroids by Relinde
More informationLecture Notes DRE 7007 Mathematics, PhD
Eivind Eriksen Lecture Notes DRE 7007 Mathematics, PhD August 21, 2012 BI Norwegian Business School Contents 1 Basic Notions.................................................. 1 1.1 Sets......................................................
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationModèles stochastiques II
Modèles stochastiques II INFO 154 Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 1 http://ulbacbe/di Modéles stochastiques II p1/50 The basics of statistics Statistics starts ith
More informationDATA MINING LECTURE 4. Similarity and Distance Sketching, Locality Sensitive Hashing
DATA MINING LECTURE 4 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY AND DISTANCE Thanks to: Tan, Steinbach, and Kumar, Introduction to Data Mining Rajaraman and Ullman, Mining
More informationAssignment 3: Chapter 2 & 3 (2.6, 3.8)
Neha Aggarwal Comp 578 Data Mining Fall 8 9-12-8 Assignment 3: Chapter 2 & 3 (2.6, 3.8) 2.6 Q.18 This exercise compares and contrasts some similarity and distance measures. (a) For binary data, the L1
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1
Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of
More informationPart I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes
Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with
More informationNotes on Linear Algebra and Matrix Theory
Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a
More informationKAAF- GE_Notes GIS APPLICATIONS LECTURE 3
GIS APPLICATIONS LECTURE 3 SPATIAL AUTOCORRELATION. First law of geography: everything is related to everything else, but near things are more related than distant things Waldo Tobler Check who is sitting
More informationTheory of LSH. Distance Measures LS Families of Hash Functions S-Curves
Theory of LSH Distance Measures LS Families of Hash Functions S-Curves 1 Distance Measures Generalized LSH is based on some kind of distance between points. Similar points are close. Two major classes
More informationFinite Temperature Field Theory + Hard Thermal Loops
Finite Temperature Field Theory + Hard Thermal Loops Michael Strickland Kent State University QCD Seminar Series I Sept 24, 2013 1 Quantum Field Theory ParBcles are excitabons of different quantum fields
More informationRecall the Basics of Hypothesis Testing
Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE
More information1 FUNDAMENTALS OF. Proof. Dr. Fuhui Long, Dr. Hongjiang Zhang and Prof. David Dagan Feng
1 FUNDAMENTALS OF Proof CONTENT-BASED IMAGE RETRIEVAL Dr. Fuhui Long, Dr. Hongjiang Zhang and Prof. David Dagan Feng We introduce in this chapter some fundamental theories for content-based image retrieval.
More informationValidation Metrics. Kathryn Maupin. Laura Swiler. June 28, 2017
Validation Metrics Kathryn Maupin Laura Swiler June 28, 2017 Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC,
More informationLecture 8: Interest Point Detection. Saad J Bedros
#1 Lecture 8: Interest Point Detection Saad J Bedros sbedros@umn.edu Last Lecture : Edge Detection Preprocessing of image is desired to eliminate or at least minimize noise effects There is always tradeoff
More informationFeature detectors and descriptors. Fei-Fei Li
Feature detectors and descriptors Fei-Fei Li Feature Detection e.g. DoG detected points (~300) coordinates, neighbourhoods Feature Description e.g. SIFT local descriptors (invariant) vectors database of
More informationMATH 614 Dynamical Systems and Chaos Lecture 6: Symbolic dynamics.
MATH 614 Dynamical Systems and Chaos Lecture 6: Symbolic dynamics. Metric space Definition. Given a nonempty set X, a metric (or distance function) on X is a function d : X X R that satisfies the following
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationActive learning in sequence labeling
Active learning in sequence labeling Tomáš Šabata 11. 5. 2017 Czech Technical University in Prague Faculty of Information technology Department of Theoretical Computer Science Table of contents 1. Introduction
More informationFundamentals of Similarity Search
Chapter 2 Fundamentals of Similarity Search We will now look at the fundamentals of similarity search systems, providing the background for a detailed discussion on similarity search operators in the subsequent
More informationAlgorithms for Picture Analysis. Lecture 07: Metrics. Axioms of a Metric
Axioms of a Metric Picture analysis always assumes that pictures are defined in coordinates, and we apply the Euclidean metric as the golden standard for distance (or derived, such as area) measurements.
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationData preprocessing. DataBase and Data Mining Group 1. Data set types. Tabular Data. Document Data. Transaction Data. Ordered Data
Elena Baralis and Tania Cerquitelli Politecnico di Torino Data set types Record Tables Document Data Transaction Data Graph World Wide Web Molecular Structures Ordered Spatial Data Temporal Data Sequential
More informationNoisy Subsequence Recognition Using Constrained String Editing Involving Substitutions, Insertions, Deletions and Generalized Transpositions 1
Noisy Subsequence Recognition Using Constrained String Editing Involving Substitutions, Insertions, Deletions and Generalized Transpositions 1 B. J. Oommen and R. K. S. Loke School of Computer Science
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia
More informationLesson 3-7: Absolute Value Equations Name:
Lesson 3-7: Absolute Value Equations Name: In this activity, we will learn to solve absolute value equations. An absolute value equation is any equation that contains an absolute value symbol. To start,
More informationVisualization of distance measures implied by forecast evaluation criteria
Visualization of distance measures implied by forecast evaluation criteria Robert M. Kunst kunst@ihs.ac.at Institute for Advanced Studies Vienna and University of Vienna Presentation at RSS International
More informationDetectors part II Descriptors
EECS 442 Computer vision Detectors part II Descriptors Blob detectors Invariance Descriptors Some slides of this lectures are courtesy of prof F. Li, prof S. Lazebnik, and various other lecturers Goal:
More informationGaussian Mixture Distance for Information Retrieval
Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,
More informationA Program for Data Transformations and Kernel Density Estimation
A Program for Data Transformations and Kernel Density Estimation John G. Manchuk and Clayton V. Deutsch Modeling applications in geostatistics often involve multiple variables that are not multivariate
More informationPROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata
' / PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE Noboru Murata Waseda University Department of Electrical Electronics and Computer Engineering 3--
More informationTOPCAT basics. Modern Astrophysics Techniques. Contact: Mladen Novak,
TOPCAT basics Modern Astrophysics Techniques Contact: Mladen Novak, mlnovak@phy.hr What is TOPCAT? TOPCAT= Tool for OPeraBons on Catalogues And Tables hep://www.star.bris.ac.uk/~mbt/topcat/ Useful, because
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian
More informationPractical Statistics
Practical Statistics Lecture 1 (Nov. 9): - Correlation - Hypothesis Testing Lecture 2 (Nov. 16): - Error Estimation - Bayesian Analysis - Rejecting Outliers Lecture 3 (Nov. 18) - Monte Carlo Modeling -
More informationLinear algebra. NEU 466M Instructor: Professor Ila R. Fiete Spring 2016
Linear algebra NEU M Instructor: Professor Ila R. Fiete Spring 01 NotaBon Matrices: upper-case A, B, U, W Vector: bold, (usually) lower-case x, y, v, w x! x (handwribng: ) Elements of matrix, vector: lower-case
More informationAchieving scale covariance
Achieving scale covariance Goal: independently detect corresponding regions in scaled versions of the same image Need scale selection mechanism for finding characteristic region size that is covariant
More informationRandom Number Generation. CS1538: Introduction to simulations
Random Number Generation CS1538: Introduction to simulations Random Numbers Stochastic simulations require random data True random data cannot come from an algorithm We must obtain it from some process
More informationDissimilarity and matching
8 Dissimilarity and matching Floriana Esposito, Donato Malerba and Annalisa Appice 8.1 Introduction The aim of symbolic data analysis (SDA) is to investigate new theoretically sound techniques by generalizing
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationSignal Modeling Techniques in Speech Recognition. Hassan A. Kingravi
Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction
More information