... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology
|
|
- Kory Walsh
- 6 years ago
- Views:
Transcription
1 ..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47
2 ..... Outline Introduction Motivation Local Methods SPARROW is a Local Method Defining the Effective Weights Defining the Observation Weights SPARROW 2/47
3 .. Motivation Problem setting... Given D := {(x i, y i ) : i = 1,, N} yi R is the output at the input xi := [x i1,, x im ] T R M our task is to estimate the regression function f : R M R such that y i = f(x i ) + ϵ i the ϵ i s are independent with zero mean. SPARROW 5/47
4 .. Motivation Global methods... In parametric approaches, the regression function is known for e.g., in multiple linear regression (MLR) we assume M f(x 0 ) = β j x 0j + ϵ j=1 we can also add higher order terms but still have a model that is linear in the parameters β j, γ j f(x 0 ) = M j=1 ( βj x 0j + γ j x 2 ) 0j + ϵ SPARROW 6/47
5 .. Motivation Global methods Continued... Example of a global nonparametric approach: ϵ-support vector regression (ϵ-svr) (Smola and Schölkopf, 2004) f(x 0 ) = N β j K(x 0, x i ) + ϵ i=1 SPARROW 7/47
6 .. Local Methods Local methods... A successful nonparametric approach to regression: local estimation (Hastie and Loader, 1993; Härdle and Linton, 1994; Ruppert and Wand, 1994) In local methods: f(x 0 ) = N l i (x 0 )y i + ϵ i=1 SPARROW 9/47
7 .. Local Methods Local methods Continued... For e.g. in k-nearest neighbor regression (k-nnr) f(x 0 ) = N i=1 α i (x 0 ) N p=1 α p(x 0 ) y i where α i (x 0 ) := I Nk (x 0)(x i ) Nk (x 0 ) D is the set of the k-nearest neighbors of x 0 SPARROW 10/47
8 .. Local Methods Local methods Continued... In weighted k-nnr (Wk-NNR), f(x 0 ) = N i=1 α i (x 0 ) N p=1 α p(x 0 ) y i α i (x 0 ) := S(x 0, x i ) 1 I Nk (x 0 )(x i ) S(x0, x i ) = (x 0 x i ) T V 1 (x 0 x i ) is the scaled Euclidean distance SPARROW 11/47
9 .. Local Methods Local methods Continued... Just so you know, here s another example of a local method: additive model (AM) (Buja et al., 1989) f(x 0 ) = M f j (x 0j ) + ϵ j=1 Estimate univariate functions of predictors locally SPARROW 12/47
10 .. Local Methods Local methods Continued... In local methods: estimate the regression function locally by a simple parametric model In local polynomial regression: estimate the regression function locally, by a Taylor polynomial This is what happens in SPARROW, as we will explain SPARROW 13/47
11 .. SPARROW is a Local Method... SPARROW 16/47
12 .. SPARROW is a Local Method I meant this sparrow... SPARROW 17/47
13 .. SPARROW is a Local Method... SPARROW is a local method Before we get into the details, see a few examples showing benefits of local methods then we ll talk about SPARROW SPARROW 18/47
14 Goal function Dataset Figure: Our generated dataset. y i = f(x i ) + ϵ i, where f(x) = (x 3 + x 2 ) I(x) + sin(x) I( x).
15 Dataset Goal function 0.4 MLR:1st MLR:2nd MLR:3rd Figure: Multiple linear regression with first-, second-, and third-order terms.
16 Dataset Goal function ε SVR Figure: ϵ-support vector regression with an RBF kernel.
17 Dataset Goal function 4 NNR Figure: 4-nearest neighbor regression.
18 .. Defining the Effective Weights... Effective weights in SPARROW In local methods: f(x 0 ) = N l i (x 0 )y i + ϵ i=1 Now we define l i (x 0 ) SPARROW 24/47
19 .. Defining the Effective Weights... Local estimation by a Taylor polynomial To locally estimate the regression function near x 0 let us approximate f(x) by a second-degree Taylor polynomial about x 0 P 2 (x) = ϕ + (x x 0 ) T θ (x x 0) T H(x x 0 ) (1) ϕ := f(x 0 ), θ := f(x 0 ) is the gradient of f(x), H := 2 f(x 0 ) is its Hessian both evaluated at x 0 SPARROW 25/47
20 .. Defining the Effective Weights... Local estimation by a Taylor polynomial Continued P 2 (x) = ϕ + (x x 0 ) T θ (x x 0) T H(x x 0 ) We need to solve the locally weighted least squares problem { min α i yi P 2 (x i ) } 2 (2) ϕ,θ,h i Ω SPARROW 26/47
21 .. Defining the Effective Weights... Local estimation by a Taylor polynomial Continued Express (2) as min A 1/2{ y XΘ(x 0 ) } 2 Θ(x 0 ) (3) a ii = α i, y := [y 1, y 2,, y N ] T 1 (x 1 x 0 ) T vech T {(x 1 x 0 )(x 1 x 0 ) T } X :=... 1 (x N x 0 ) T vech T {(x N x 0 )(x N x 0 ) T } parameter supervector: Θ(x 0 ) := [ ϕ, θ, vech(h) ] T SPARROW 27/47
22 .. Defining the Effective Weights... Local estimation by a Taylor polynomial Continued The solution: Θ(x 0 ) = ( X T AX ) 1 X T Ay And so the local quadratic estimate is ˆϕ = ˆf(x 0 ) = e T 1 ( X T AX ) 1 X T Ay Since f(x 0 ) = N i=1 l i(x 0 )y i, the vector of effective weights for SPARROW is [l 1 (x 0 ),, l N (x 0 )] T = A T X ( X T AX ) 1 e1 SPARROW 28/47
23 .. Defining the Effective Weights... Local estimation by a Taylor polynomial Continued The local constant regression estimate is ˆf(x 0 ) = (1 T A1) 1 1 T Ay = N i=1 α i (x 0 ) N k=1 α k(x 0 ) y i. Look familiar? SPARROW 29/47
24 .. Defining the Observation Weights... Observation weights in SPARROW We have to assign the weights here { min α i yi f(x i ) } 2 ϕ,θ,h i Ω that is, the diagonal elements of A min A 1/2{ y XΘ(x 0 ) } 2 Θ(x 0 ) (4) SPARROW 31/47
25 .. Defining the Observation Weights... Observation weights in SPARROW Continued To find α i we solve the following problem (Chen et al., 1995) min α 1 subject to x 0 Dα 2 α R N 2 σ (5) σ > 0 limits [ the maximum approximation ] error and D := x1 x 1, x 2 x 2,, x N x N SPARROW 32/47
26 .. Defining the Observation Weights Power family of penalties l p norms raised to the pth power... ( ) x p p = x i p i (6) For 1 p <, (6) is convex. 0 < p 1, is the range of p useful for measuring sparsity. SPARROW 33/47
27 Figure: As p goes to 0, x p becomes the indicator function and x p becomes a count of the nonzeros in x (Bruckstein et al., 2009).
28 .. Defining the Observation Weights... Representation by sparse approximation Continued To motivate this idea let s look at feature learning with sparse coding, and sparse representation classification (SRC) an example of exemplar-based sparse approximation SPARROW 35/47
29 .. Defining the Observation Weights... Unsupervised feature learning Application to image classification x 0 = Dα An example is the recent work by Coates and Ng (2011). where x 0 is the input vector could be a vectorized image patch, or a SIFT descriptor α is the higher-dimensional sparse representation of x0 D is usually learned SPARROW 36/47
30 Figure: Image classification (Coates et al., 2011).
31 .. Defining the Observation Weights Multiclass classification (Wright et al., 2009)... D := { (x i, y i ) : x i R m, y i {1,, c}, i {1,, N} } Given a test sample x 0 1. Solve min α R N α 1 subject to x 0 Dα 2 2 σ 2. Define {α y : y {1,, c}} where [α y ] i = α i if x i belongs to class y, o.w Construct X (α) := {ˆx y (α) = Dα y, y {1,, c} } 4. Predict ŷ := arg min y {1,...,c} x 0 ˆx y (α) 2 2 SPARROW 38/47
32 Figure: SRC on handwritten image dataset.
33 Figure: SVM with linear kernel on handwritten image dataset.
34 ..... Back to SPARROW with evaluation on the MPG dataset Auto MPG Data Set from the UCI Machine Learning Repository (Frank and Asuncion, 2010) The data concerns city-cycle fuel consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 4 continuous attributes. number of instances: 392 number of attributes: 7 (cylinders, displacement, horsepower, weight, acceleration, model year, origin) SPARROW 42/47
35 MSE NWR 4 NNR W4 NNR SPARROW MLR LASSO Figure: Average mean squared error values achieved by various methods over 10-fold cross-validation.
36 ..... Looking ahead What is causing the success of SPARROW and SRC? How important is the bandwidth? What about in SRC? SPARROW 44/47
37 References References I Alfred M. Bruckstein, David L. Donoho, and Michael Elad. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 51(1):34 81, Andreas Buja, Trevor Hastie, and Robert Tibshirani. Linear smoothers and additive models. The Annals of Statistics, 17(2): , Scott S. Chen, David L. Donoho, and Michael A. Saunders. Atomic decomposition by basis pursuit. Technical Report 479, Department of Statistics, Stanford University, May Adam Coates and Andrew Ng. The importance of encoding versus training with sparse coding and vector quantization. In International Conference on Machine Learning (ICML), pages , Adam Coates, Honglak Lee, and Andrew Y. Ng. Self-taught learning: Transfer learning from unlabeled data. In International Conference on AI and Statistics, SPARROW 45/47
38 References References II A. Frank and A. Asuncion. UCI machine learning repository, URL Wolfgang Härdle and Oliver Linton. Applied nonparametric methods. Technical Report 1069, Yale University, T. J. Hastie and C. Loader. Local regression: Automatic kernel carpentry. Statistical Science, 8(2): , D. Ruppert and M. P. Wand. Multivariate locally weighted least squares regression. The Annals of Statistics, 22: , Alex J. Smola and Bernhard Schölkopf. A tutorial on support vector regression. Statistics and Computing, 14: , August John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31: , SPARROW 46/47
39 References Acknowledgements This is ongoing work carried out under the supervision of Prof. Bob L. Sturm of Aalborg University Copenhagen. Thanks to Isaac Nickaein and Sheida Bijani for helping out with the slides. SPARROW 47/47
Recent Advances in Structured Sparse Models
Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationSemi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion
Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationIntroduction to Three Paradigms in Machine Learning. Julien Mairal
Introduction to Three Paradigms in Machine Learning Julien Mairal Inria Grenoble Yerevan, 208 Julien Mairal Introduction to Three Paradigms in Machine Learning /25 Optimization is central to machine learning
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationLast Lecture Recap. UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 6: Regression Models with Regulariza8on
UVA CS 45 - / 65 7 Introduc8on to Machine Learning and Data Mining Lecture 6: Regression Models with Regulariza8on Yanun Qi / Jane University of Virginia Department of Computer Science Last Lecture Recap
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationImproved Local Coordinate Coding using Local Tangents
Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationHierarchical Penalization
Hierarchical Penalization Marie Szafransi 1, Yves Grandvalet 1, 2 and Pierre Morizet-Mahoudeaux 1 Heudiasyc 1, UMR CNRS 6599 Université de Technologie de Compiègne BP 20529, 60205 Compiègne Cedex, France
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationHierarchical kernel learning
Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationMetric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)
Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy
More informationA Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation Dongdong Chen and Jian Cheng Lv and Zhang Yi
More informationCSE446: non-parametric methods Spring 2017
CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want
More informationFoundation of Intelligent Systems, Part I. Regression
Foundation of Intelligent Systems, Part I Regression mcuturi@i.kyoto-u.ac.jp FIS-2013 1 Before starting Please take this survey before the end of this week. Here are a few books which you can check beyond
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationBackpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning)
Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Julien Mairal UC Berkeley Edinburgh, ICML, June 2012 Julien Mairal, UC Berkeley Backpropagation Rules for Sparse Coding 1/57 Other
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationBack to the future: Radial Basis Function networks revisited
Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu
More informationCompressed Sensing Meets Machine Learning - Classification of Mixture Subspace Models via Sparse Representation
Compressed Sensing Meets Machine Learning - Classification of Mixture Subspace Models via Sparse Representation Allen Y Yang Feb 25, 2008 UC Berkeley What is Sparsity Sparsity A
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationAction Recognition on Distributed Body Sensor Networks via Sparse Representation
Action Recognition on Distributed Body Sensor Networks via Sparse Representation Allen Y Yang June 28, 2008 CVPR4HB New Paradigm of Distributed Pattern Recognition Centralized Recognition
More informationDiscriminative Learning and Big Data
AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture
More informationSolving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie
Solving Regression Jordan Boyd-Graber University of Colorado Boulder LECTURE 12 Slides adapted from Matt Nedrich and Trevor Hastie Jordan Boyd-Graber Boulder Solving Regression 1 of 17 Roadmap We talked
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationNonlinear Learning using Local Coordinate Coding
Nonlinear Learning using Local Coordinate Coding Kai Yu NEC Laboratories America kyu@sv.nec-labs.com Tong Zhang Rutgers University tzhang@stat.rutgers.edu Yihong Gong NEC Laboratories America ygong@sv.nec-labs.com
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationA Quick Tour of Linear Algebra and Optimization for Machine Learning
A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationPathwise coordinate optimization
Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,
More informationCompressed Sensing and Related Learning Problems
Compressed Sensing and Related Learning Problems Yingzhen Li Dept. of Mathematics, Sun Yat-sen University Advisor: Prof. Haizhang Zhang Advisor: Prof. Haizhang Zhang 1 / Overview Overview Background Compressed
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationSupport Vector Machine Regression for Volatile Stock Market Prediction
Support Vector Machine Regression for Volatile Stock Market Prediction Haiqin Yang, Laiwan Chan, and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationSEMI-SUPERVISED classification, which estimates a decision
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Semi-Supervised Support Vector Machines with Tangent Space Intrinsic Manifold Regularization Shiliang Sun and Xijiong Xie Abstract Semi-supervised
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationLearning From Data: Modelling as an Optimisation Problem
Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationCluster Kernels for Semi-Supervised Learning
Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationSMO Algorithms for Support Vector Machines without Bias Term
Institute of Automatic Control Laboratory for Control Systems and Process Automation Prof. Dr.-Ing. Dr. h. c. Rolf Isermann SMO Algorithms for Support Vector Machines without Bias Term Michael Vogt, 18-Jul-2002
More informationIntroduction to machine learning
1/59 Introduction to machine learning Victor Kitov v.v.kitov@yandex.ru 1/59 Course information Instructor - Victor Vladimirovich Kitov Tasks of the course Structure: Tools lectures, seminars assignements:
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationLeast Squares SVM Regression
Least Squares SVM Regression Consider changing SVM to LS SVM by making following modifications: min (w,e) ½ w 2 + ½C Σ e(i) 2 subject to d(i) (w T Φ( x(i))+ b) = e(i), i, and C>0. Note that e(i) is error
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationTufts COMP 135: Introduction to Machine Learning
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationLEC 3: Fisher Discriminant Analysis (FDA)
LEC 3: Fisher Discriminant Analysis (FDA) A Supervised Dimensionality Reduction Approach Dr. Guangliang Chen February 18, 2016 Outline Motivation: PCA is unsupervised which does not use training labels
More informationSupport Vector Machines With Adaptive L q Penalty
DEPARTMENT OF STATISTICS North Carolina State University 50 Founders Drive, Campus Box 803 Raleigh, NC 7695-803 Institute of Statistics Mimeo Series No. 60 April 7, 007 Support Vector Machines With Adaptive
More informationMultiple Kernel Learning
CS 678A Course Project Vivek Gupta, 1 Anurendra Kumar 2 Sup: Prof. Harish Karnick 1 1 Department of Computer Science and Engineering 2 Department of Electrical Engineering Indian Institute of Technology,
More information1 Graph Kernels by Spectral Transforms
Graph Kernels by Spectral Transforms Xiaojin Zhu Jaz Kandola John Lafferty Zoubin Ghahramani Many graph-based semi-supervised learning methods can be viewed as imposing smoothness conditions on the target
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationLearning with Consistency between Inductive Functions and Kernels
Learning with Consistency between Inductive Functions and Kernels Haixuan Yang Irwin King Michael R. Lyu Department of Computer Science & Engineering The Chinese University of Hong Kong Shatin, N.T., Hong
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationClassification and Support Vector Machine
Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationSVRG++ with Non-uniform Sampling
SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract
More informationA Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases
2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationarxiv: v5 [stat.me] 12 Jul 2016
Reliable Prediction Intervals for Local Linear Regression Mohammad Ghasemi Hamed a,, Masoud Ebadi Kivaj b a IFSTTAR-COSYS-LIVIC, 25 Allée des Marronniers 78000 Versailles, France b Independent Researcher,
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationRobust Kernel-Based Regression
Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More information12. Cholesky factorization
L. Vandenberghe ECE133A (Winter 2018) 12. Cholesky factorization positive definite matrices examples Cholesky factorization complex positive definite matrices kernel methods 12-1 Definitions a symmetric
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationLearning Bound for Parameter Transfer Learning
Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter
More information5.6 Nonparametric Logistic Regression
5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More information