Comparing Dissimilarity Representations
|
|
- Clifton Mosley
- 5 years ago
- Views:
Transcription
1 Comparing Dissimilarity Representations Adam Cardinal-Stakenas Department of Applied Mathematics & Statistics Johns Hopkins University Interface, Durham, NC 1 / 18
2 Comparing Dissimilarity Representations Adam Cardinal-Stakenas Department of Applied Mathematics & Statistics Johns Hopkins University Carey E. Priebe Zhiliang Ma Interface, Durham, NC Youngser Park 1 / 18
3 Introduction The Statistical Inference Problem Dissimilarities Approaches Framework Multiple Dissimilarities Motivation Examples Image/Caption Fusion 2 / 18
4 The Problem: Classification Setup: (X i, Y i ) i.i.d F XY with X i : Ω Ξ, and Y i : Ω {1,, J}. 3 / 18
5 The Problem: Classification Setup: (X i, Y i ) i.i.d F XY with X i : Ω Ξ, and Y i : Ω {1,, J}. Examples: Image analysis Text analysis Graph data Computational anatomy 3 / 18
6 The Problem: Classification Setup: (X i, Y i ) i.i.d F XY with X i : Ω Ξ, and Y i : Ω {1,, J}. Examples: Image analysis Text analysis Graph data Computational anatomy Goal: To obtain superior performance in the exploitation task using comparisons of the data. 3 / 18
7 Dissimilarities Definition A dissimilarity measure is a function δ : Ξ Ξ R + {0} with: 1. Positivity: δ(x 1, x 2 ) 0, 2. Identifiability: δ(x 1, x 2 ) = 0 x 1 = x Reflexivity: δ(x, x) = 0, 4. Symmetry: δ(x 1, x 2 ) = δ(x 2, x 1 ), 4 / 18
8 Dissimilarities Definition A dissimilarity measure is a function δ : Ξ Ξ R + {0} with: 1. Positivity: δ(x 1, x 2 ) 0, 2. Identifiability: δ(x 1, x 2 ) = 0 x 1 = x Reflexivity: δ(x, x) = 0, 4. Symmetry: δ(x 1, x 2 ) = δ(x 2, x 1 ), A dissimilarity representation for a set of n objects is expressed as a symmetric and hollow matrix with positive off-diagonal entries. 4 / 18
9 Dissimilarities Definition A dissimilarity measure is a function δ : Ξ Ξ R + {0} with: 1. Positivity: δ(x 1, x 2 ) 0, 2. Identifiability: δ(x 1, x 2 ) = 0 x 1 = x Reflexivity: δ(x, x) = 0, 4. Symmetry: δ(x 1, x 2 ) = δ(x 2, x 1 ), A dissimilarity representation for a set of n objects is expressed as a symmetric and hollow matrix with positive off-diagonal entries. We may observe numerous dissimilarities. That is, for a given Ξ, we may have available or be able to calculate, δ 1, δ 2,..., δ k where δ i : Ξ Ξ R is a dissimilarity function for i {1,..., l} 4 / 18
10 Example: R 2 y a x δ δ(a, b) δ(b, c) δ(a, c) Man Euc Min. p= Sup c b Manhattan Distance: δ Man. = x 1 x 2 + y 1 y 2 Euclidean Distance: δ Euc. = (x 1 x 2 ) 2 + (y 1 y 2 ) 2 Minkowsi Distance: δ Min. p = p x 1 x 2 p + y 1 y 2 p Supremum Distance: δ Sup. = max{ x 1 x 2, y 1 y 2 } 5 / 18
11 Introduction The Statistical Inference Problem Dissimilarities Approaches Framework Multiple Dissimilarities Motivation Examples Image/Caption Fusion 6 / 18
12 Framework Three dissimilarity classification approaches Dissimilarity Matrix (n x n) Dissimilarity Matrix (n x n) Feature Matrix (n x d) Neighborhood-based Dissimilarity Space-based Embedding
13 Framework Three dissimilarity classification approaches Dissimilarity Matrix (n x n) Dissimilarity Matrix (n x n) Feature Matrix (n x d) Neighborhood-based Dissimilarity Space-based Embedding 1. The neighborhood-based approach interprets dissimilarities as neighborhood relations.
14 Framework Three dissimilarity classification approaches Dissimilarity Matrix (n x n) Dissimilarity Matrix (n x n) Feature Matrix (n x d) Neighborhood-based Dissimilarity Space-based Embedding 1. The neighborhood-based approach interprets dissimilarities as neighborhood relations. 2. The dissimilarity space approach defines a representation set R = {p 1,, p r }, and interprets dissimilarities from a point to each element of the representation set as features of this point.
15 Framework Three dissimilarity classification approaches Dissimilarity Matrix (n x n) Dissimilarity Matrix (n x n) Feature Matrix (n x d) Neighborhood-based Dissimilarity Space-based Embedding 1. The neighborhood-based approach interprets dissimilarities as neighborhood relations. 2. The dissimilarity space approach defines a representation set R = {p 1,, p r }, and interprets dissimilarities from a point to each element of the representation set as features of this point. 3. The embedding approach embeds dissimilarities into R d in such a way that the configuration s interpoint distances approximate the original dissimilarities.
16 X Ξ F X R d 1 δ 1 D F g δ ˆL F g X R d 2 δ g D ˆL M g δ ˆL F, F, F: feature extraction procedures δ {δ 1,..., δ k }: dissimilarity function g : classifer g δ : dissimilarity based classifier X R k δ g ˆL ˆL 8 / 18
17 Comparing Dissimilarities To choose a dissimilarity, a classifier-independent methodology for evaluating a dissimilarity representation is needed. 9 / 18
18 Comparing Dissimilarities To choose a dissimilarity, a classifier-independent methodology for evaluating a dissimilarity representation is needed. To achieve this, measure the congruence from an observed dissimilarity matrix to Y, the ideal dissimilarity matrix (from the training data). 9 / 18
19 Comparing Dissimilarities To choose a dissimilarity, a classifier-independent methodology for evaluating a dissimilarity representation is needed. To achieve this, measure the congruence from an observed dissimilarity matrix to Y, the ideal dissimilarity matrix (from the training data). We define congruence using the multidimensional scaling criterion function [Borg & Groenen]: i<j µ : D D [0, 1], µ(d(x), δ) = d(x i, X j )δ ij i<j d(x i, X j ) 2. i<j δ2 ij 9 / 18
20 Comparing Dissimilarities To choose a dissimilarity, a classifier-independent methodology for evaluating a dissimilarity representation is needed. To achieve this, measure the congruence from an observed dissimilarity matrix to Y, the ideal dissimilarity matrix (from the training data). We define congruence using the multidimensional scaling criterion function [Borg & Groenen]: i<j µ : D D [0, 1], µ(d(x), δ) = d(x i, X j )δ ij i<j d(x i, X j ) 2. i<j δ2 ij We calculate c( ) = µ(, Y ). 9 / 18
21 Comparing Dissimilarities To choose a dissimilarity, a classifier-independent methodology for evaluating a dissimilarity representation is needed. To achieve this, measure the congruence from an observed dissimilarity matrix to Y, the ideal dissimilarity matrix (from the training data). We define congruence using the multidimensional scaling criterion function [Borg & Groenen]: i<j µ : D D [0, 1], µ(d(x), δ) = d(x i, X j )δ ij i<j d(x i, X j ) 2. i<j δ2 ij We calculate c( ) = µ(, Y ). Larger values indicate stronger correlation to the ideal dissimilarity. 9 / 18
22 Motivation In the style of Devroye, Györfi, and Lugosi, the typical comparison measure for a classification problem involve calculating some measurement based on the posterior probabilities. For example 10 / 18
23 Motivation In the style of Devroye, Györfi, and Lugosi, the typical comparison measure for a classification problem involve calculating some measurement based on the posterior probabilities. For example L = E{min(η(X), 1 η(x))} 10 / 18
24 Motivation In the style of Devroye, Györfi, and Lugosi, the typical comparison measure for a classification problem involve calculating some measurement based on the posterior probabilities. For example L = E{min(η(X), 1 η(x))} L NN = E{2η(X)(1 η(x))} 10 / 18
25 Motivation In the style of Devroye, Györfi, and Lugosi, the typical comparison measure for a classification problem involve calculating some measurement based on the posterior probabilities. For example L = E{min(η(X), 1 η(x))} L NN = { E{2η(X)(1 η(x))} η(x)(1 } ρ = E η(x) 10 / 18
26 Motivation In the style of Devroye, Györfi, and Lugosi, the typical comparison measure for a classification problem involve calculating some measurement based on the posterior probabilities. For example L = E{min(η(X), 1 η(x))} L NN = { E{2η(X)(1 η(x))} η(x)(1 } ρ = E η(x) The congruence coefficient is classifier independent in the sense that it only depends on the data, however it is not (obviously) asymptotically related to the posteriors. 10 / 18
27 Introduction The Statistical Inference Problem Dissimilarities Approaches Framework Multiple Dissimilarities Motivation Examples Image/Caption Fusion 11 / 18
28 Image/Caption Fusion Motivation Example: Image/Caption Fusion: Images and text are most naturally considered marginally. The geometries of image spaces and caption spaces are not well understood. Our fusion methodologies allow for subsequent joint analyses. Images!I Captions!C This undated picture Tiger shows Woods Sri lets Lankan Detroit go of his Tigers troops club shortstop looking after A two-monthold Guillen at weapons hitting Carlos a seized wayward throws from the shot Indochinese the ball on the to first 7th in tee tiger cub a is double seen play inside its cage at the / 18
29 Image/Caption Fusion The data 140,577 images & captions were collected from Yahoo! Photos website 1,600 pairs were selected using query word tiger on captions. They were labeled manually based only on captions: label # animal tiger 148 Detroit Tigers baseball team 145 Tiger Woods the golfer 897 Tamil Tigers soldiers of Sri Lanka 330 Leicester Tigers rugby team 48 others 32 Two class problem: Tiger Woods and Tamil Tigers. 13 / 18
30 2 D m.d.s X 2 R d 2 X 1 R d 1 δ 1 δ 2 3 D C Ξ WC X R d MI X 3 R d 3 δ 3 δ 1 m.d.s WC: Word count histogram MI: Mutual Information δ 1 =Euclidecan distance 1 D δ 2 =cosine distance m.d.s δ 3 =random forests prox. X 4 R d 4 m.d.s.=multidimensional scaling 14 / 18
31 Comparing Dissimilarities c-function: comparison of various caption dissimilarities cos. dist. emb. rand. for. emb. word count M.I. w/ disc. cos. dist. rand. for. Density / 18
32 Comparing Dissimilarities Estimated probability of misclassification cos. dist. rand. for. word count M.I. w/ disc. Density / 18
33 Conclusion and Future Work We can evaluate how closely a particular dissimilarity representation matches the dissimilarities between the class labels. This measure relates to classifier performance, but it does not depend on the use of a particular classifier. More work is needed to investigate how congruence coefficient changes with the dimensionality of the data. To improve the performace of the congruence coefficient for predicting classifier performace we wish to investigate tailoring the ideal dissimilarity to the observed data. 17 / 18
34 Questions? Adam Cardinal-Stakenas Department of Applied Mathematics and Statistics Johns Hopkins University adam 18 / 18
arxiv: v1 [stat.ml] 17 Sep 2012
Generalized Canonical Correlation Analysis for Disparate Data Fusion Ming Sun a, Carey E. Priebe b,, Minh Tang c arxiv:1209.3761v1 [stat.ml] 17 Sep 2012 a Department of Electrical and Computer Engineering,
More informationRandom forests and averaging classifiers
Random forests and averaging classifiers Gábor Lugosi ICREA and Pompeu Fabra University Barcelona joint work with Gérard Biau (Paris 6) Luc Devroye (McGill, Montreal) Leo Breiman Binary classification
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationarxiv: v1 [stat.ml] 29 Jul 2012
arxiv:1207.6745v1 [stat.ml] 29 Jul 2012 Universally Consistent Latent Position Estimation and Vertex Classification for Random Dot Product Graphs Daniel L. Sussman, Minh Tang, Carey E. Priebe Johns Hopkins
More informationManifold matching: Joint optimization of fidelity and commensurability
Brazilian Journal of Probability and Statistics 2013, Vol. 27, No. 3, 377 400 DOI: 10.1214/12-BJPS188 Brazilian Statistical Association, 2013 Manifold matching: Joint optimization of fidelity and commensurability
More informationCurve learning. p.1/35
Curve learning Gérard Biau UNIVERSITÉ MONTPELLIER II p.1/35 Summary The problem The mathematical model Functional classification 1. Fourier filtering 2. Wavelet filtering Applications p.2/35 The problem
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationA Generalized Kernel Approach to Dissimilarity-based Classification
Journal of Machine Learning Research 2 (21) 175-211 Submitted 3/1; Published 12/1 A Generalized Kernel Approach to Dissimilarity-based Classification Elżbieta P ekalska Pavel Paclík Robert P.W. Duin Pattern
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationThe Out-of-Sample Problem for Classical Multidimensional Scaling
The Out-of-Sample Problem for Classical Multidimensional Scaling Michael W. Trosset a and Carey E. Priebe b December 21, 2006; revised February 10, 2008 Technical Report 06-04 Department of Statistics
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Similarity and Dissimilarity Similarity Numerical measure of how alike two data objects are. Is higher
More informationAdaptive Crowdsourcing via EM with Prior
Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationVertex Nomination via Attributed Random Dot Product Graphs
Vertex Nomination via Attributed Random Dot Product Graphs Carey E. Priebe cep@jhu.edu Department of Applied Mathematics & Statistics Johns Hopkins University 58th Session of the International Statistical
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationTUM 2016 Class 1 Statistical learning theory
TUM 2016 Class 1 Statistical learning theory Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Machine learning applications Texts Images Data: (x 1, y 1 ),..., (x n, y n ) Note: x i s huge dimensional! All
More informationProbability and Measure
Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationNaive Bayes classification
Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental
More informationInt. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS014) p.4149
Int. Statistical Inst.: Proc. 58th orld Statistical Congress, 011, Dublin (Session CPS014) p.4149 Invariant heory for Hypothesis esting on Graphs Priebe, Carey Johns Hopkins University, Applied Mathematics
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example
More informationL2: Review of probability and statistics
Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationNearest neighbor classification in metric spaces: universal consistency and rates of convergence
Nearest neighbor classification in metric spaces: universal consistency and rates of convergence Sanjoy Dasgupta University of California, San Diego Nearest neighbor The primeval approach to classification.
More informationGraph Metrics and Dimension Reduction
Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November
More informationNearest Neighbors Methods for Support Vector Machines
Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad
More informationLower bounds for lowest eigenvalues
Lower bounds for lowest eigenvalues Peter Dole DRAFT Version 0.3A dated September 994 UNDER CONSTRUCTION GNU FDL Introduction In this abortive paper, we will discuss how to find lower bounds for lowest
More informationOptimal global rates of convergence for interpolation problems with random design
Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More informationTwo-sample hypothesis testing for random dot product graphs
Two-sample hypothesis testing for random dot product graphs Minh Tang Department of Applied Mathematics and Statistics Johns Hopkins University JSM 2014 Joint work with Avanti Athreya, Vince Lyzinski,
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationPermutation groups/1. 1 Automorphism groups, permutation groups, abstract
Permutation groups Whatever you have to do with a structure-endowed entity Σ try to determine its group of automorphisms... You can expect to gain a deep insight into the constitution of Σ in this way.
More informationDiscriminative Learning in Speech Recognition
Discriminative Learning in Speech Recognition Yueng-Tien, Lo g96470198@csie.ntnu.edu.tw Speech Lab, CSIE Reference Xiaodong He and Li Deng. "Discriminative Learning in Speech Recognition, Technical Report
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationInderjit Dhillon The University of Texas at Austin
Inderjit Dhillon The University of Texas at Austin ( Universidad Carlos III de Madrid; 15 th June, 2012) (Based on joint work with J. Brickell, S. Sra, J. Tropp) Introduction 2 / 29 Notion of distance
More informationLecture 4 Noisy Channel Coding
Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 1
Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationBayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI
Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache
More informationTutorial on Approximate Bayesian Computation
Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016
More informationNonparametric inference for ergodic, stationary time series.
G. Morvai, S. Yakowitz, and L. Györfi: Nonparametric inference for ergodic, stationary time series. Ann. Statist. 24 (1996), no. 1, 370 379. Abstract The setting is a stationary, ergodic time series. The
More informationHypothesis Testing For Multilayer Network Data
Hypothesis Testing For Multilayer Network Data Jun Li Dept of Mathematics and Statistics, Boston University Joint work with Eric Kolaczyk Outline Background and Motivation Geometric structure of multilayer
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationDISTRIBUTIONAL SEMANTICS
COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.
More informationDistributions of Persistence Diagrams and Approximations
Distributions of Persistence Diagrams and Approximations Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August 31, 2018 Thanks V. Maroulas
More informationVapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012
Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv:203.093v2 [math.st] 23 Jul 202 Servane Gey July 24, 202 Abstract The Vapnik-Chervonenkis (VC) dimension of the set of half-spaces of R d with frontiers
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationMUTUAL INFORMATION (MI) specifies the level of
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 7, JULY 2010 3497 Nonproduct Data-Dependent Partitions for Mutual Information Estimation: Strong Consistency and Applications Jorge Silva, Member, IEEE,
More informationDo Neural Network Cross-Modal Mappings Really Bridge Modalities?
Do Neural Network Cross-Modal Mappings Really Bridge Modalities? Language Intelligence and Information Retrieval group (LIIR) Department of Computer Science Story Collell, G., Zhang, T., Moens, M.F. (2017)
More informationA physical interpretation of the rigidity matrix
A physical interpretation of the rigidity matrix Hyo-Sung Ahn 1 (Collaborations with Minh Hoang Trinh, Zhiyong Sun, Brian D. O. Anderson, and Viet Hoang Pham) 1 Distributed Control & Autonomous Systems
More informationHierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationThe Dynamics of Learning: A Random Matrix Approach
The Dynamics of Learning: A Random Matrix Approach ICML 2018, Stockholm, Sweden Zhenyu Liao, Romain Couillet L2S, CentraleSupélec, Université Paris-Saclay, France GSTATS IDEX DataScience Chair, GIPSA-lab,
More informationClustering Lecture 1: Basics. Jing Gao SUNY Buffalo
Clustering Lecture 1: Basics Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics Clustering
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationCMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /
CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors Furong Huang / furongh@cs.umd.edu What we know so far Decision Trees What is a decision tree, and how to induce it from
More informationStatistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010
Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu COMPSTAT
More informationECE 5615/4615 Computer Project
Set #1p Due Friday March 17, 017 ECE 5615/4615 Computer Project The details of this first computer project are described below. This being a form of take-home exam means that each person is to do his/her
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationSF2930 Regression Analysis
SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationMarkov Chains and Hidden Markov Models
Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden
More informationDistribution-specific analysis of nearest neighbor search and classification
Distribution-specific analysis of nearest neighbor search and classification Sanjoy Dasgupta University of California, San Diego Nearest neighbor The primeval approach to information retrieval and classification.
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationBayes rule and Bayes error. Donglin Zeng, Department of Biostatistics, University of North Carolina
Bayes rule and Bayes error Definition If f minimizes E[L(Y, f (X))], then f is called a Bayes rule (associated with the loss function L(y, f )) and the resulting prediction error rate, E[L(Y, f (X))],
More informationComputer simulation on homogeneity testing for weighted data sets used in HEP
Computer simulation on homogeneity testing for weighted data sets used in HEP Petr Bouř and Václav Kůs Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University
More informationGI01/M055: Supervised Learning
GI01/M055: Supervised Learning 1. Introduction to Supervised Learning October 5, 2009 John Shawe-Taylor 1 Course information 1. When: Mondays, 14:00 17:00 Where: Room 1.20, Engineering Building, Malet
More informationMetric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)
Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy
More informationConditional Independence
Conditional Independence Sargur Srihari srihari@cedar.buffalo.edu 1 Conditional Independence Topics 1. What is Conditional Independence? Factorization of probability distribution into marginals 2. Why
More informationIntroduction to Graphical Models
Introduction to Graphical Models STA 345: Multivariate Analysis Department of Statistical Science Duke University, Durham, NC, USA Robert L. Wolpert 1 Conditional Dependence Two real-valued or vector-valued
More informationSupervised Learning Coursework
Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationPattern Recognition Problem. Pattern Recognition Problems. Pattern Recognition Problems. Pattern Recognition: OCR. Pattern Recognition Books
Introduction to Statistical Pattern Recognition Pattern Recognition Problem R.P.W. Duin Pattern Recognition Group Delft University of Technology The Netherlands prcourse@prtools.org What is this? What
More informationOptimal exact tests for complex alternative hypotheses on cross tabulated data
Optimal exact tests for complex alternative hypotheses on cross tabulated data Daniel Yekutieli Statistics and OR Tel Aviv University CDA course 29 July 2017 Yekutieli (TAU) Optimal exact tests for complex
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going
More informationprobability of k samples out of J fall in R.
Nonparametric Techniques for Density Estimation (DHS Ch. 4) n Introduction n Estimation Procedure n Parzen Window Estimation n Parzen Window Example n K n -Nearest Neighbor Estimation Introduction Suppose
More informationLinear Discriminant Functions
Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and
More informationf-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models
IEEE Transactions on Information Theory, vol.58, no.2, pp.708 720, 2012. 1 f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models Takafumi Kanamori Nagoya University,
More informationKnowledge Discovery with Iterative Denoising
Knowledge Discovery with Iterative Denoising kegiles@vcu.edu www.people.vcu.edu/~kegiles Assistant Professor Department of Statistics and Operations Research Virginia Commonwealth University Associate
More informationBINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES
BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES DAVID MCDIARMID Abstract Binary tree-structured partition and classification schemes are a class of nonparametric tree-based approaches to classification
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationMachine Learning. Probabilistic KNN.
Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationEEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as
L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationComputational Complexity of Inference
Computational Complexity of Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. What is Inference? 2. Complexity Classes 3. Exact Inference 1. Variable Elimination Sum-Product Algorithm 2. Factor Graphs
More informationBasic Properties of Metric and Normed Spaces
Basic Properties of Metric and Normed Spaces Computational and Metric Geometry Instructor: Yury Makarychev The second part of this course is about metric geometry. We will study metric spaces, low distortion
More informationLearning Objectives for Stat 225
Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:
More informationPartially labeled classification with Markov random walks
Partially labeled classification with Markov random walks Martin Szummer MIT AI Lab & CBCL Cambridge, MA 0239 szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA 0239 tommi@ai.mit.edu Abstract To
More informationAre Bayesian Inferences Weak for Wasserman s Example?
Are Bayesian Inferences Weak for Wasserman s Example? Longhai Li longhai@math.usask.ca Department of Mathematics and Statistics University of Saskatchewan Saskatoon, Saskatchewan, S7N 5E6 Canada Quebec
More information