Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint
|
|
- Gwendolyn Hawkins
- 5 years ago
- Views:
Transcription
1 Sparse Kernel Density Estimation Technique Based on Zero-Norm Constraint Xia Hong 1, Sheng Chen 2, Chris J. Harris 2 1 School of Systems Engineering University of Reading, Reading RG6 6AY, UK x.hong@reading.ac.uk 2 School of Electronics and Computer Science University of Southampton, Southampton SO17 1BJ, UK s: {sqc,cjh}@ecs.soton.ac.uk International Joint Conference on Neural Networks 2010
2 Outline 1 Motivations Existing Regularisation Approaches Our Contributions 2 Proposed Sparse Kernel Density Estimator Problem Formulation Approximate Zero-Norm Regularisation D-Optimality Based Subset Selection 3 Numerical Examples Experimental Set Up Experimental Results 4 Conclusions
3 Outline 1 Motivations Existing Regularisation Approaches Our Contributions 2 Proposed Sparse Kernel Density Estimator Problem Formulation Approximate Zero-Norm Regularisation D-Optimality Based Subset Selection 3 Numerical Examples Experimental Set Up Experimental Results 4 Conclusions
4 Regularisation Methods Two-norm of weight vector Naturally combined with quadratic main cost function, and computationally efficient implementation Only drive many weights to small near-zero values One-norm of weight vector Can drive many weights to zero, and hence should achieve sparser results than two-norm based method Harder to minimise and higher complexity implementation Zero-norm of weight vector Ultimate model sparsity and generalisation performance Intractable in implementation, and even with approximation, very difficult to minimise and impose very high complexity Two-norm and one-norm based regularisations have been combined with OLS algorithm, with the former approach providing highly efficient sparse kernel modelling
5 Outline 1 Motivations Existing Regularisation Approaches Our Contributions 2 Proposed Sparse Kernel Density Estimator Problem Formulation Approximate Zero-Norm Regularisation D-Optimality Based Subset Selection 3 Numerical Examples Experimental Set Up Experimental Results 4 Conclusions
6 Our Contributions We incorporate an effective approximate zero-norm regularisation into sparse kernel density estimation Approximate zero-norm naturally merges into underlying constrained nonnegative quadratic programming Various SVM algorithms can readily be applied to obtain SKD estimate efficiently Proposed sparse kernel density estimator: use D-optimality OLS subset selection to select a small number of significant kernels, in terms of kernel eigenvalues then solve final SKD estimate from associate subset constrained nonnegative quadratic programming
7 Outline 1 Motivations Existing Regularisation Approaches Our Contributions 2 Proposed Sparse Kernel Density Estimator Problem Formulation Approximate Zero-Norm Regularisation D-Optimality Based Subset Selection 3 Numerical Examples Experimental Set Up Experimental Results 4 Conclusions
8 Kernel Density Estimation Give finite data set D N = {x k } N k=1, drawn from unknown density p(x), where x k R m Infer p(x) based on D N using kernel density estimate ˆp(x; β N, ρ) = N β k K ρ (x, x k ) k=1 s.t. β k 0, 1 k N, β T N1 N = 1 Here β N = [β 1 β 2 β N ] T : kernel weight vector, 1 N : the vector of ones with dimension N, and K ρ (, ): chosen kernel function with kernel width ρ Unsupervised density estimation supervised regression using Parzen window estimate as desired response
9 Regression Formulation For x k D N, denote ŷ k = ˆp(x k ; β N, ρ), y k as Parzen window estimate at x k, and ε k = y k ŷ k regression formulation y k = ŷ k + ε k = φ T N(k)β N + ε k or over D N y = Φ N β N + ε Associated constrained nonnegative quadratic programming { } 1 min 2 β βt NB N β N v T N β N N s.t. β T N1 N = 1 and β i 0, 1 i N where B N = Φ T NΦ N is the design matrix and v N = Φ T Ny This is not using kernel density estimate to fit Parzen window estimate!
10 Outline 1 Motivations Existing Regularisation Approaches Our Contributions 2 Proposed Sparse Kernel Density Estimator Problem Formulation Approximate Zero-Norm Regularisation D-Optimality Based Subset Selection 3 Numerical Examples Experimental Set Up Experimental Results 4 Conclusions
11 Zero-Norm Constraint Given α > 0, an approximation to zero norm β N 0 is N β N 0 (1 ) e α β i i=1 Combining this zero-norm constraint with constrained NNQP { 1 min 2 βt NB N β N v T N β N + λ N ( )} 1 e α β i β N i=1 s.t. β T N1 N = 1 and β i 0, 1 i N with λ > 0 a small regularisation parameter With 2nd order Taylor series expansion for e α β i e α β i 1 α β i + α2 βi 2 2 N ( ) N N 1 e α β i α β i α2 2 i=1 i=1 i=1 β 2 i
12 Constrained NNQP Hence, new constrained NNQP { } 1 min 2 β βt NA N β N v T N β N N s.t. β T N1 N = 1 and β i 0, 1 i N A N = B N δi N and δ = λα 2 predetermined small parameter Remark: Under convexity constraint on β N, minimisation of approximate zero norm maximisation of two norm β T NI N β N Design matrix B N should positive definite, and δ bounded by smallest eigenvalue of B N so that A N also positive definite Common for B N of large data set to be ill-conditioned Approach most effective when it is applied following some model subset selection preprocessing
13 Outline 1 Motivations Existing Regularisation Approaches Our Contributions 2 Proposed Sparse Kernel Density Estimator Problem Formulation Approximate Zero-Norm Regularisation D-Optimality Based Subset Selection 3 Numerical Examples Experimental Set Up Experimental Results 4 Conclusions
14 D-Optimality Design Least squares estimate ˆβ N = B 1 N ΦT Ny is unbiased and covariance matrix of estimate Cov [ˆβ ] N B 1 N Estimation accurate depends on condition number C = max{σ i, 1 i N} min{σ i, 1 i N} where σ i, 1 i N, are eigenvalues of B N D-optimality design maximises determinant of design matrix Selected subset model Φ Ns maximises ) det (Φ T Ns Φ Ns = det ( ) B Ns Prevent oversized ill-posed model and high estimate variances Unsupervised D-optimality design particularly suitable for determining structure of kernel density estimate
15 OFR Aided Algorithm Orthogonal forward regression selects Φ Ns of N s significant kernels based on D-optimality criterion Complexity of this preprocessing no more than O(N 2 ) This preprocessing results in subset constrained NNQP { } 1 min 2 β βt N s A Ns β Ns v T N s β Ns Ns s.t. β T N s 1 Ns = 1 and β i 0, 1 i N s with v Ns = Φ T N s y, A Ns = B Ns δi Ns, B Ns = Φ T N s Φ Ns, δ < w T N s w Ns Various SVM algorithms can be used to solve this problem As N s is very small and A Ns is well-conditioned, we use simple multiplicative nonnegative quadratic programming algorithm Complexity of which is negligible, in comparison with O(N 2 ) of D-optimality based OFR preprocessing
16 Outline 1 Motivations Existing Regularisation Approaches Our Contributions 2 Proposed Sparse Kernel Density Estimator Problem Formulation Approximate Zero-Norm Regularisation D-Optimality Based Subset Selection 3 Numerical Examples Experimental Set Up Experimental Results 4 Conclusions
17 Experimental Setup Training set had N randomly drawn samples, while test set of N test = 10, 000 samples for calculating L 1 test error L 1 = 1 N test p(x k ) ˆp(x k ; β N N, ρ) test k=1 between true density p(x) and estimate ˆp(x k ; β N, ρ) Numerical approximation of Kullback-Leibler divergence (KLD) p(x) D KL (p ˆp) = p(x) log R ˆp(x; β m N, ρ) dx also used for testing in 2-D case Proposed SKD estimator compared with PW estimator, our previous SKD estimator and reduced set density estimator (RSDE), as well as Gaussian mixture model (GMM) estimator
18 Outline 1 Motivations Existing Regularisation Approaches Our Contributions 2 Proposed Sparse Kernel Density Estimator Problem Formulation Approximate Zero-Norm Regularisation D-Optimality Based Subset Selection 3 Numerical Examples Experimental Set Up Experimental Results 4 Conclusions
19 First 2-D Example True density: mixture of Gaussian and Laplacian distributions p(x 1, x 2 ) = 1 (x 1 2) 2 4π e 2 e (x 2 2) e 0.7 x 1+2 e 0.5 x 2+2 N = 500, and experiment repeated N run = 100 times Performance comparison, N = 500 and average over 100 runs estimator PW previous SKD RSDE GMM proposed SKD kernel ρ Par = 0.42 ρ = 1.1 ρ = 1.2 tunable ρ = 1.1 L ± ± ± ± ± 0.69 KLC ± ± ± ± ± 0.31 kernel no ± ± ± 1.5 maximum minimum Similar test performance to existing kernel density estimators, but sparser estimate
20 Second 2-D Example True density: mixture of five Gaussian distributions p(x, y) = 5 i=1 1 (x µ i,1 ) 2 10π e 2 e (y µ i,2 )2 2 Five means of Gaussian distributions: [ ], [ ], [ ], [ ], and [ ] Performance comparison, N = 500 and average over 100 runs estimator PW previous SKD RSDE GMM proposed SKD kernel ρ Par = 0.5 ρ = 1.1 ρ = 1.2 tunable ρ = 1.0 L ± ± ± ± ± 0.63 KLC ± ± ± ± ± 1.09 kernel no ± ± ± 1.3 maximum minimum Similar test performance to existing kernel density estimators, but sparser estimate
21 6-D Example True density: mixture of three Gaussian distributions p(x) = i=1 1 1 (2π) 6/2 det 1/2 Γ i e 1 (x µ 2 i )T Γ 1 i (x µ i ) with µ 1 = [ ] T Γ 1 = diag{1.0, 2.0, 1.0, 2.0, 1.0, 2.0} µ 2 = [ ] T Γ 2 = diag{2.0, 1.0, 2.0, 1.0, 2.0, 1.0} µ 3 = [ ] T Γ 3 = diag{2.0, 1.0, 2.0, 1.0, 2.0, 1.0} Estimation set had N = 600 samples, and experiment was repeated N run = 100 times
22 6-D Example Results Performance comparison, N = 600 and average over 100 runs estimator PW previous SKD RSDE GMM proposed SKD kernel ρ Par = 0.65 ρ = 1.2 ρ = 1.2 tunable ρ = 1.2 L ± ± ± ± ± 0.24 kernel no ± ± ± 1.3 maximum minimum Similar test performance to existing kernel density estimators, but sparser estimate
23 Conclusions We have integrated zero-norm regularisation naturally into construction of sparse kernel density estimator Classical Parzen window estimate as desired response Convexity constraint with zero-norm approximation turns problem into tractable nonnegative quadratic programming D-optimality preprocessing selects small significant kernel subset to ensure well-conditioned solution Complexity compares favourably with existing sparse kernel density estimators Zero-norm regularisation and D-optimality aided estimator offers an efficient means for selecting very sparse kernel density estimates with excellent generalisation performance
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine
Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationSparse density estimation on the multinomial manifold
Sparse density estimation on the multinomial manifold Article Accepted Version Hong, X., Gao, J., Chen, S. and Zia, T. (2015) Sparse density estimation on the multinomial manifold. IEEE Transactions on
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationVariational Methods in Bayesian Deconvolution
PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationEngineering, 2009, 2, Published Online August 2009 in SciRes ( TABLE OF CONTENTS
Engineering, 009,, 55-3 Published Online August 009 in SciRes (http://www.scirp.org/journal/eng/). ABLE OF COES Volume umber August 009 Orthogonal-Least-Squares Forward Selection for Parsimonious Modelling
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationLecture 1b: Linear Models for Regression
Lecture 1b: Linear Models for Regression Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationThe Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models
The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationEstimation of Gaussian process regression model using probability distance measures
Estimation of Gaussian process regression model using probability distance measures Article Published Version Creative Commons: Attribution 3. CC BY) Open Access Hong, X., Gao, J., Jiang, X. and Harris,
More informationApproximating the Covariance Matrix with Low-rank Perturbations
Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu
More informationInformation Theory and Feature Selection (Joint Informativeness and Tractability)
Information Theory and Feature Selection (Joint Informativeness and Tractability) Leonidas Lefakis Zalando Research Labs 1 / 66 Dimensionality Reduction Feature Construction Construction X 1,..., X D f
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More information(Multivariate) Gaussian (Normal) Probability Densities
(Multivariate) Gaussian (Normal) Probability Densities Carl Edward Rasmussen, José Miguel Hernández-Lobato & Richard Turner April 20th, 2018 Rasmussen, Hernàndez-Lobato & Turner Gaussian Densities April
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationSupport Vector Machines for Regression
COMP-566 Rohan Shah (1) Support Vector Machines for Regression Provided with n training data points {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )} R s R we seek a function f for a fixed ɛ > 0 such that: f(x
More informationUsing Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method
Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Antti Honkela 1, Stefan Harmeling 2, Leo Lundqvist 1, and Harri Valpola 1 1 Helsinki University of Technology,
More informationStochastic Spectral Approaches to Bayesian Inference
Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to
More informationSeveral ways to solve the MSO problem
Several ways to solve the MSO problem J. J. Steil - Bielefeld University - Neuroinformatics Group P.O.-Box 0 0 3, D-3350 Bielefeld - Germany Abstract. The so called MSO-problem, a simple superposition
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationCovariance and Correlation
Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationA QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS
A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS Dan Hendrycks University of Chicago dan@ttic.edu Steven Basart University of Chicago xksteven@uchicago.edu ABSTRACT We introduce a
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationCHAPTER 11. A Revision. 1. The Computers and Numbers therein
CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationHigh Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution
High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution Presented by 1 In collaboration with Mathieu Fauvel1, Stéphane Girard2
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationNew Machine Learning Methods for Neuroimaging
New Machine Learning Methods for Neuroimaging Gatsby Computational Neuroscience Unit University College London, UK Dept of Computer Science University of Helsinki, Finland Outline Resting-state networks
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationGaussian Process Approximations of Stochastic Differential Equations
Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationSufficient Dimension Reduction using Support Vector Machine and it s variants
Sufficient Dimension Reduction using Support Vector Machine and it s variants Andreas Artemiou School of Mathematics, Cardiff University @AG DANK/BCS Meeting 2013 SDR PSVM Real Data Current Research and
More informationOutline Lecture 2 2(32)
Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationThe Gaussian distribution
The Gaussian distribution Probability density function: A continuous probability density function, px), satisfies the following properties:. The probability that x is between two points a and b b P a
More informationy(x) = x w + ε(x), (1)
Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationMIT Spring 2015
Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationOverview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met
c Outlines Statistical Group and College of Engineering and Computer Science Overview Linear Regression Linear Classification Neural Networks Kernel Methods and SVM Mixture Models and EM Resources More
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationVariational Approximate Inference in Latent Linear Models
Variational Approximate Inference in Latent Linear Models Edward Arthur Lester Challis A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of the
More informationPart I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes
Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If
More informationKernel Methods & Support Vector Machines
Kernel Methods & Support Vector Machines Mahdi pakdaman Naeini PhD Candidate, University of Tehran Senior Researcher, TOSAN Intelligent Data Miners Outline Motivation Introduction to pattern recognition
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationDimensionality Reduction by Unsupervised Regression
Dimensionality Reduction by Unsupervised Regression Miguel Á. Carreira-Perpiñán, EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan Zhengdong Lu, CSEE, OGI http://www.csee.ogi.edu/~zhengdon
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationUnsupervised Classifiers, Mutual Information and 'Phantom Targets'
Unsupervised Classifiers, Mutual Information and 'Phantom Targets' John s. Bridle David J.e. MacKay Anthony J.R. Heading California Institute of Technology 139-74 Defence Research Agency Pasadena CA 91125
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationTokamak profile database construction incorporating Gaussian process regression
Tokamak profile database construction incorporating Gaussian process regression A. Ho 1, J. Citrin 1, C. Bourdelle 2, Y. Camenen 3, F. Felici 4, M. Maslov 5, K.L. van de Plassche 1,4, H. Weisen 6 and JET
More informationLecture 14. Clustering, K-means, and EM
Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More information