Advanced Machine Learning & Perception
|
|
- Martin Eugene Cunningham
- 5 years ago
- Views:
Transcription
1 Advanced Machine Learning & Perception Instructor: Tony Jebara
2 Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels Mixture Model Probability Product Kernels Junction Tree Algorithm for Kernels
3 SVMs and Kernels Recall the SVM dual optimization: L D : max α i i 1 α 2 i,j i α j y i y j k x i,x j subject to α i 0,C and y i i α i = 0 Solve a QP by using the Gram matrix K where K = Then KKT conditions give us b. The SVM prediction is then: = sign α i y i k ( x,x i ) f x k ( x 1,x 2 ) k ( x 1,x 3 ) k ( x 2,x 2 ) k ( x 2,x 3 ) k ( x 2,x 3 ) k ( x 3,x 3 ) k x 1,x 1 k x 1,x 2 k x 1,x 3 +b i K ij = k ( x i,x j ) Kernels replace vector dot products for nonlinear boundary Kernels also allow arbitrary large structured input spaces!
4 Standard Kernels on Vectors Kernels usually take two input vectors and output a scalar: Polynomial Kernel (gives p twists in SVM decision boundary) = φ x i k x i,x j,φ x j RBF Kernel (phi is infinite dimensional) Hyperbolic Tangent Kernel = ( x T i x j + 1) p k ( x i,x j ) = φ( x i ),φ( x j ) = exp 12σ x x 2 i j k ( x i,x j ) = φ( x i ),φ( x j ) = tanh( κx T i x j δ) But why just deal with vector inputs? We can apply kernels between any type of input: graphs, strings, probabilities, structured spaces, etc.! 2
5 String Kernels Inputs are two strings: X = aardvark X = accent Want k(x,x ) = scalar value = φ(x) T φ(x ) One choice for features φ(x) is the number of times each substrings of length 1, 2 and 3 appears in X φ(x) = [#a #b #c #d #e #f #y #z #aa #ab #ac ] = [ ] φ(x ) = [ ] Can do this efficiently via dynamic programming
6 String Kernels Want k(x,x ) = scalar value = φ(x) T φ(x ) Explicit Kernel can be slow because of many substrings s in our final vocabulary φ s (X) = number of times substring s appears in X φ a (aardvark) = 3 φ ar (aardvark) = 2 Implicit Kernel is more efficient k(x,x ): for each substring s of X, count how often s appears in X via dynamic programming in time proportional to the product of the lengths of X and X. This computes the dot product much more quickly!
7 Probability Kernels Inputs are two probabilities: p = p(x θ) = Want k(p,p ) = scalar value How to design this? p = p(x θ ) = Recall that a kernel can be converted into a distance: D( p i, p j ) = k ( p i, p i ) 2k ( p i, p j ) + k ( p j, p j ) What is a natural distance between two probabilities? Kullback-Leibler Divergence! = p i x D p i p j ( x) log p i x p j dx caveats?
8 Fisher Kernels KL isn t symmetric distance, probabilities lie on a manifold! So we must approximate it Try quadratic distance measure θ*. like Mahalanobis distance. θ θ. T I θ * D( p '' p ') U p '' U p ' Linearize manifold at θ * (some maximum likelihood point) Get distance between p and p via a distance from θ to θ The right kernel to go with the Mahalanobis distance is: =U p '' I θ * k p '', p ' where U p '' = θ '' 1 U p ' Matrix I is the Fisher information log p x θ 1 U p '' U p ' and U p ' = θ log p x θ ' ( i, j) = p( x θ * ) log p ( x θ ) I θ * θ i log p x θ θ j dx
9 Other Possible Divergences The Kullback-Leibler Divergence is asymmetric and tricky All divergences have: D( p,q) 0,D ( p,q) = 0iff p = q Consider KL, Jensen-Shannon and Symmetrized KL = p x = 1 p+q KL p, 2 2 = 1 KL p,q KL p,q JS p,q SKL p,q log p( x)q 1 ( x)dx KL( p+q q, ) 2 KL q, p Hellinger divergence is symmetric & can be written as kernel! H ( p,q) = 1 4 ( p( x) q ( x) ) 2 dx = k ( p, p) 2k p,q sqrt(js) is a true distance metric (Endres & Schindelin, 2003) + k ( q,q) what kernel is this?
10 Bhattacharyya Kernel A kernel between distributions that gives Hellinger divergence = p( x) q x k p,q dx A slight generalization is the Probability Product Kernel = p( x) k p,q ρ ( q x ) ρ dx Setting ρ=1/2 gives the Bhattacharyya Kernel = p( x) q x k p,q dx p( x) q ( x ) Setting ρ=1 gives the Expected Likelihood Kernel k ( p,q) = E p x q x = E q( x) p x
11 Probability Product Kernel Imagine we are given inputs which are probabilities and labels Dataset = {(p 1,y 1 ), (p 2,y 2 ),, (p n,y n )} E.g. each input i is a brand of car p i is its brakepad failure probability over time y i is binary, did government approve it for US roads? Build SVM with Probability Product Kernels to predict if a brakepad failure distribution will be approved/rejected by gov t
12 Probability Product Kernel Imagine we are given input data vectors with labels Dataset = {(x 1,y 1 ), (x 2,y 2 ),, (x n,y n )} E.g. each input i is a brand of car x i is time when one car of brand i failed y i is binary, did government approve it for US roads? For each i, estimate from x i probability p i =p(x θ i ) Dataset = {(p 1,y 1 ), (p 2,y 2 ),, (p n,y n )} * Build SVM with Probability Product Kernels to predict if a brakepad failure distribution will be approved/rejected by gov t
13 Probability Product Kernel Imagine we are given input data vector-sets with labels Dataset = {(χ 1,y 1 ), (χ 2,y 2 ),, (χ n,y n )} E.g. each input i is a brand of car χ i is a set of times when several cars of brand i failed y i is binary, did government approve it for US roads? For each i, estimate from χ i ={x i1,,x it } probability p i =p(x θ i ) Dataset = {(p 1,y 1 ), (p 2,y 2 ),, (p n,y n )} * ** Build SVM with Probability Product Kernels to predict if a brakepad failure distribution will be approved/rejected by gov t
14 Gaussian Product Kernels For Gaussians: = N ρ x µ k p, p ' N ρ ( x µ ')dx exp µ µ ' 2 / σ (if µ=x and µ =x, get back RBF kernel ) (Fisher kernel here is just a linear kernel ) For Gaussians with variable covariance: k ( p, p ') = N ρ ( x µ,σ)n ρ ( x µ ',Σ')dx Σ ρ/2 Σ' ρ/2 Σ 1/2 exp( ρ 2 µt Σ 1 µ ρ µ 2 'T Σ' 1 µ '+ 1 2 µt Σµ ) 1 and µ = ρσ 1 µ + ρσ' 1 µ ' where Σ = ρσ 1 + ρσ' 1 (Fisher kernel here is just a quadratic kernel )
15 Multinomial Product Kernels Bernoulli: (binary x) Multinomial: (discrete x) = D x γ d ( d 1 γ ) 1 x d d p x θ k χ, χ ' d =1 = p( x) p x θ ρ ( p '( x) ) ρ dx = D γ d γ d ' d =1 = α d x d k χ, χ ' D d =1 = ( α d α d ') ρ D d =1 ρ + ( 1 γ ) ρ ( d 1 γ d ') ρ For multinomial counts (for N words per document): k ( χ, χ ') = ( α d α d ') 1/2 N D d =1 (Fisher for Multinomial is linear )
16 Multinomial Product Kernels WebKB dataset: Faculty vs. student web page SVM classifier 20-Fold Cross-Validation, 1641 student & 1124 faculty,... Use Bhattacharyya Kernel on multinomial (word frequency) Training Set Size = 77 Training Set Size = 622
17 Exponential Family Kernels Exponential Family: Gaussian, Multinomial, Binomial, Poisson, Inverse Wishart, Gamma, Bernoulli, Dirichlet, p( x θ) = exp A( x) +T ( x) T θ K ( θ) Maximum likelihood is straightforward: K ( θ ) = 1 θ n T ( x ) n n All have above form but different A(x), T(x), convex K(q)
18 Exponential Family Kernels Compute the Bhattacharyya Kernel for the e-family: p( x θ) = exp A( x) +T ( x) T θ K ( θ) Analytic solution for e-family: k ( p, p ') = p( x θ) 1/2 p( x θ ') 1/2 dx ( ) = exp K ( 1 θ + 1 θ ' 2 2 ) 1 K ( 1 θ 2 2 ) 1 K θ ' 2 Only depends on convex cumulant-generating function K(q) Meanwhile, Fisher Kernel is always linear in sufficient stats U x = θ * = T χ log p χ θ T I θ * U χ I 1 θ U * χ ' = T ( χ) g 1 T ( θ *K θ χ ') g
19 Gaussian Product Kernels Instead of a single x & x Construct p & p from many x & x samples Use bag of vectors { χ 1,, χ } N p( x) = N ( x µ,σ) µ = 1 N i χ i, Σ = 1 N ( χ i µ )( χ i µ ) T i { χ ' ' 1,, χ } N ' p '( x) = N ( x µ ',Σ') µ ' = 1 N i χ i ', Σ' = 1 N χ i ' µ χ i ' µ T i k ( p, p ') = Σ ρ/2 Σ' ρ/2 Σ 1/2 exp( ρ 2 µt Σ 1 µ ρ µ 2 'T Σ' 1 µ '+ 1 2 µt Σµ )
20 Kernelized Gaussian Product Bhattacharyya affinity on Gaussians on bags {x,...} & {x,...} Invariant to order of tuples in each bag But too simple, just overlap of Gaussians on images Need more detail than mean and covariance of pixels Use another Kernel to find Gaussian in Hilbert space = φ( χ) T φ( χ ') κ χ, χ ' Never compute outerproducts, use kernel, i.e. infinite RBF: = exp 1 χ χ ' 2 2σ 2 κ χ, χ ' Compute mini-kernel between each pixel in a given image Gives kernelized or augmented Gaussian µ and Σ via Gram
21 Kernelized Gaussian Product Previously: Now: Compute Hilbert Gaussian s mean & covariance of each image bag or image is N x N pixel Gram matrix using kappa kernel Use kernel PCA to regularize infinite dimensional RBF Gaussian φ( χ ') µ = E χ { } { } Σ = E ( χ µ )( χ µ ) T { } Σ = E ( φ( χ) µ )( φ( χ) µ ) T µ = E φ( χ) κ( x 1,x 1 ) κ( x 1,x N ) κ( x N,x N ) κ x N,x 1 k ( φ χ ) Σ ρ/2 Σ' ρ/2 Σ 1/2 exp ρ 2 µt Σ 1 µ ρ µ 2 'T Σ' 1 µ '+ 1 2 µt Σµ Puts all dimensions (X,Y,I) on an equal footing κ( x 1,x 1 ) κ( x 1,x M ) κ( x M,x M ) κ x M,x 1
22 Kernelized Gaussian Product Letter R with 3 KPCA Components of RBF Kernel Reconstruction of Letter R with 1-4 KPCA with RBF Kernel Reconstruction of Letter R with 3 KPCA with RBF Kernel + Smoothing
23 Kernelized Gaussian Product x40 monochromatic images of crosses & squares translating & scaling SVM: Train 50, Test 50 Fisher for Gaussian is Quadratic Kernel RBF Kernel (red) 67% accuracy Bhattacharyya (blue) 90% accuracy
24 Kernelized Gaussian Product SVM Classification of NIST digit images 0,1,,9 Sample each image to get bag of 30 (X,Y) pixels Train on random 120, test on random 80 bag-of-vectors Bhattacharyya outperforms standard RBF due to built-in invariance Fisher Kernel for Gaussian is quadratic
25 Mixture Product Kernels Beyond Exponential Family: Mixtures and Hidden Variables Need ρ=1 Expected Likelihood kernel χ p x χ ' p ' x = M p( m) m=1 = p '( n) N n=1 k ( χ, χ ') = p( x)p '( x)dx p x m p ' x n N = p( m)p ' n M m=1 n=1 p x m = M N p( m)p '( n)c m,n m=1 n=1 p '( x n)dx Use M*N subkernel evaluations from our previous repertoire
26 HMM Product Kernels Hidden Markov Models: (sequences) p( x) = p( s 0 )p x 0 s T 0 S # of hidden configurations large p s t s t 1 t=1 p x t s t #configs = s T Kernel: k ( χ, χ ') = p( x)p '( x)dx = p( S)p ' U S U p x S p '( x U )dx = p( S)p '( S U U )c S,U computes cross-product of all hidden variables O s T u T
27 HMM Product Kernels = S U t p( s t s t 1 )p '( u t u t 1 ) p x t s t = S,U t p( s t s t 1 )p '( u t u t 1 )c st,u t = p( s 0 )p '( u 0 )ψ( s 0,u 0 ) p s t s t 1 k χ, χ ' S,U Take advantage of structure in HMMs via Bayesian network t p '( x t u t )dx t p '( u t u t 1 )ψ s t,u t Only compute subkernels for common parents Evaluate total of O ( T s u ) subkernels Form clique potential fn s, sum via junction tree algorithm u 0,s 0 u 1,s 1 u 2,s 2 u 3,s 3 u 4,s 4
28 Sampling Product Kernels Can approximate probability product kernel via sampling By definition, generative models can: 1) Generate a Sample 2) Compute Likelihood of a Sample Thus, approximate probability product via sampling: k ( χ, χ ') = k ( p, p ') = p( x)p '( x)dx β N x i p( x) i=1 N p '( x ) i + 1 β N ' x i ' p '( x) i=1 N ' p( x i ') Beta controls how much sampling from each distribution
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationA Kernel Between Sets of Vectors
A Kernel Between Sets of Vectors Risi Kondor Tony Jebara Columbia University, New York, USA. 1 A Kernel between Sets of Vectors In SVM, Gassian Processes, Kernel PCA, kernel K de nes feature map Φ : X
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationDeviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation
Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation
More informationKernel methods CSE 250B
Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA)
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationKernel Methods. Konstantin Tretyakov MTAT Machine Learning
Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Non-linear models Unsupervised machine learning Generic scaffolding So far Supervised
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/22/2010 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements W7 due tonight [this is your last written for
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)
More informationKernel Methods. Konstantin Tretyakov MTAT Machine Learning
Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Least squares regression, SVR Fisher s discriminant, Perceptron, Logistic model,
More informationFoundation of Intelligent Systems, Part I. SVM s & Kernel Methods
Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationtopics about f-divergence
topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationMachine Learning and Data Mining. Support Vector Machines. Kalev Kask
Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationProbability Product Kernels
Journal of Machine Learning Research 5 (004) 89 844 Submitted /04; Published 7/04 Probability Product Kernels Tony Jebara Risi Kondor Andrew Howard Computer Science Department Columbia University New York,
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationSupport Vector Machines and Kernel Methods
Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationMachine Learning 4771
Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationLecture 1 October 9, 2013
Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationNaïve Bayes. Vibhav Gogate The University of Texas at Dallas
Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationKernel Methods. Charles Elkan October 17, 2007
Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More informationChapter 6 Classification and Prediction (2)
Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationLearning From Data Lecture 26 Kernel Machines
Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M Magdon-Ismail CSCI 4100/6100 recap: The Kernel Allows Us to Bypass Z-space
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationThe Perceptron Algorithm
The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationInformation Geometric view of Belief Propagation
Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More information