# Relevance Vector Machines

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 LUT February 21, 2011

2 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise

3 Support Vector Machines The relevance vector machine (RVM) is a bayesian sparse kernel technique for regression and classification Solves some problems with the support vector machines (SVM) Used in detection and classification. Detecting cancer cells, classificating DNA sequences... etc.

4 Support Vector Machines Support Vector Machines (SVM) A non-probabilistic decision machine. Returns point estimate for regression and binary decision for classification. Makes decisions based on the function: y(x; w) = w i K(x, x i ) + w 0 (1) where K is the kernel function and w 0 is the bias. Attempts to minimize the error while simultaneously maximize the margin between the two classes.

5 Support Vector Machines Support Vector Machines (SVM) y = 1 y = 0 y = 1 y = 1 y = 0 y = 1 margin

6 Support Vector Machines SVM Problems The number of required support vectors typically grows linearly with the size of the training set Non-probabilistic predictions. Requires estimation of error/margin trade-off parameters K(x, x i ) must satisfy mercel s condition.

7 Model / Regression Marginal Likelihood Apply bayesian treatment to SVM. Associates a prior over the model weights governed by a set of hyperparameters. Posterior distributions of the majority of weights are peaked around zero. Training vectors associated with the non-zero weights are the relevance vectors. Typically utilizes fewer kernel functions than SVM.

8 The model Outline Model / Regression Marginal Likelihood For given data set of input-target pairs {x n, t n } N n=1 t n = y(x n ; w) + ɛ n (2) where ɛ n are samples from some noise process which is assumed to be mean-zero Gaussian with variance σ 2. Thus, p(t n x) = N (t n y(x n ), σ 2 ) (3)

9 The model (cont.) Outline Model / Regression Marginal Likelihood encode sparsity in the prior. p(w α) = N i=0 which is Gaussian, but conditioned on α. N (w i 0, α 1 i ) (4) we must define hyperpriors over all α m to complete the specification of hierarchical prior: p(w m ) = p(w m α m )p(α m )dα m (5)

10 Regression Outline Model / Regression Marginal Likelihood The model has independent Gaussian noise: t n N (y(x n ; w), σ 2 ) Corresponding likelihood: { p(t w, σ 2 ) = (2πσ 2 ) N/2 exp 1 } t Φw 2 2σ2 (6) where t = (t q,..., t N ), w = (w q,..., w M ) and Φ is the NxM design matrix with Φ n m = φ m (x n )

11 The model (cont.) Outline Model / Regression Marginal Likelihood The desired posterior over all unknowns: p(w, α, σ 2 t) = p(t w, α, σ2 )p(w, α, σ 2 ) p(t) (7) When given a new test point, x, predictions are made for the corresponding target t, in terms of predictive distribution: p(t t) = p(t w, α, σ 2 )p(w, α, σ 2 t)dwdαdσ 2 (8) But we have a problem here. We cannot perform these computations analytically. Approximations are needed.

12 The model (cont.) Outline Model / Regression Marginal Likelihood We need to decompose the posterior as: p(w, α, σ 2 t) = p(w t, α, σ 2 )p(α, σ 2 t) (9) And so, the posterior distribution over the weights is: p(w t, α, σ 2 ) = p(t w, α, σ2 )p(w α) p(t α, σ 2 ) N (w µ, Σ) (10) where Σ = (σ 2 Φ T Φ + A) 1 (11) µ = σ 2 ΣΦ T t (12)

13 Marginal Likelihood Outline Model / Regression Marginal Likelihood Marginal Likelihood can be written as p(t α, σ 2 ) = p(t w, σ 2 )p(w α)dw (13) Maximizing the marginal likelyhood function is known as the type-ii maximum likelihood method. We must optimize p(t α, σ 2 ). There are a few ways to do this.

14 Marginal Likelihood optimization Model / Regression Marginal Likelihood Maximizes (13) with iterative re-estimation. Differentiating logp(t α, σ 2 ) gives iterative re-estimation approach: αi new = γ i µ 2 i (14) (σ 2 ) new t Φµ 2 = N Σ M i=1 γ i where we have defined quantities as γ i = 1 α i Σ ii. γ i is a measure of how well-determined is the parameter w i (15)

15 Model / Regression Marginal Likelihood RVMs for classification The likelihood P(t w) is now Bernoulli: P(t w) = N g{y(x n ; w)} t n[1 g{y(x n ; w)}] 1 tn (16) n=1 with g(y) = 1/(1 + e y ) the sigmoid function. No noise variance, same sparse prior as regression. Unlike regression, The weight posteriors p(w t, α) cannot be obtained analytically. Approximations are once again needed.

16 Model / Regression Marginal Likelihood Gaussian posterior approximation Find posterior mode w M P for current values of α by using optimization Compute Hessian Negate and invert to give the covariance for a gaussian approximation p(w t, α) N (w M P, Σ) α are updated using µ and Σ.

17 Regression RVM Regression Example sinc function: sinc(x) = sin(x)/x Linear spline kernel: K(x m, x n ) = 1 + x m x n + x m x n min(x m, x n ) xm+xn 2 min(x m, x m ) 2 + min(xm,xn)3 3 with ɛ = 0.01, 100 uniform, noise-free samples.

18 RVM Regression Example Regression

19 RVM Regression Example Regression

20 Regression RVM Example Ripley s synthetic data Gaussian kernel: K(x m, x n ) = exp( r 2 ) x m x n 2 with r = 0.5

21 RVM Example Regression

22 Relevance vector machines Exercise Sparsity: the prediction of new inputs depend on the kernel function evaluated at a subset of the training data points. TODO More detailed explanation in the original publication: Tipping M., Sparse Bayesian Learning and the Relevance Vector Machine, Journal of Machine Learning Research 1, 2001, pp

23 Relevance vector machines Exercise Exercise Fetch Tipping s matlab toolbox for sparse bayes from http: // Try SparseBayesDemo.m with different likelihood models (Gaussian, Bernoulli...) and familiarize yourself with the toolbox Try to replicate results from the regression example.

### Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine

Bayesian Inference: Principles and Practice 3. Sparse Bayesian Models and the Relevance Vector Machine Mike Tipping Gaussian prior Marginal prior: single α Independent α Cambridge, UK Lecture 3: Overview

### Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

### Fast Marginal Likelihood Maximisation for Sparse Bayesian Models

Fast Marginal Likelihood Maximisation for Sparse Bayesian Models Michael E. Tipping and Anita C. Faul Microsoft Research, Cambridge, U.K.....................................................................

### Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

### NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

### CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

### Outline Lecture 2 2(32)

Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

### Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

### Machine Learning Lecture 7

Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

### Machine Learning. 7. Logistic and Linear Regression

Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

### Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

### GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

### Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

### Outline lecture 2 2(30)

Outline lecture 2 2(3), Lecture 2 Linear Regression it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic Control

### Linear Classification

Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

### Bayesian Logistic Regression

Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic Generative

### Bayesian methods in economics and finance

1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

### Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

### Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

### Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

### Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

### Least Squares Regression

E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

### Linear Models for Classification

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

### Introduction to Gaussian Process

Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

### Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

### Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

### Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Predic?ve Distribu?on (1) Predict t for new values of x by integra?ng over w: where The Evidence Approxima?on (1) The

### Smooth Bayesian Kernel Machines

Smooth Bayesian Kernel Machines Rutger W. ter Borg 1 and Léon J.M. Rothkrantz 2 1 Nuon NV, Applied Research & Technology Spaklerweg 20, 1096 BA Amsterdam, the Netherlands rutger@terborg.net 2 Delft University

### Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

### 10-701/ Machine Learning, Fall

0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

### Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

### INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first \$10k human genome technology

### Linear Models for Regression CS534

Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

### Linear Models for Regression CS534

Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

### CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

### Bayesian Learning (II)

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

### COS 424: Interacting with Data. Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007

COS 424: Interacting ith Data Lecturer: Rob Schapire Lecture #15 Scribe: Haipeng Zheng April 5, 2007 Recapitulation of Last Lecture In linear regression, e need to avoid adding too much richness to the

### Density Estimation. Seungjin Choi

Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

### PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

### Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

### Machine Learning Gaussian Naïve Bayes Big Picture

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative

### Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

### Machine Learning 2017

Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

### Parameter Expanded Variational Bayesian Methods

Parameter Expanded Variational Bayesian Methods Yuan (Alan) Qi MIT CSAIL 32 Vassar street Cambridge, MA 02139 alanqi@csail.mit.edu Tommi S. Jaakkola MIT CSAIL 32 Vassar street Cambridge, MA 02139 tommi@csail.mit.edu

### Machine Learning Lecture 5

Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

### IN RECENT years, sparse kernel-based learning methods

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 6, DECEMBER 2013 2241 The Relevance Sample-Feature Machine: A Sparse Bayesian Learning Approach to Joint Feature-Sample Selection Yalda Mohsenzadeh, Hamid

### CPSC 540: Machine Learning

CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

### Machine Learning, Midterm Exam

10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

### LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

### Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

### Infinite Ensemble Learning with Support Vector Machinery

Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems

### COMS 4771 Regression. Nakul Verma

COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

### Linear Regression (9/11/13)

STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

### Gaussian Processes (10/16/13)

STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

### Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

### > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

### Adaptive Sparseness Using Jeffreys Prior

Adaptive Sparseness Using Jeffreys Prior Mário A. T. Figueiredo Institute of Telecommunications and Department of Electrical and Computer Engineering. Instituto Superior Técnico 1049-001 Lisboa Portugal

### Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

### Machine Learning

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 2, 2015 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

### Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

### Naïve Bayes classification

Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

### Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

### Model Selection for Gaussian Processes

Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

### Introduction to Machine Learning

Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

### LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

### Linear Regression (continued)

Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

### Automatic Relevance Determination

Automatic Relevance Determination Elia Liitiäinen (eliitiai@cc.hut.fi) Time Series Prediction Group Adaptive Informatics Research Centre Helsinki University of Technology, Finland October 24, 2006 Introduction

### Linear Models in Machine Learning

CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

### 5.6 Nonparametric Logistic Regression

5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that

### p L yi z n m x N n xi

y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

### Perceptron Revisited: Linear Separators. Support Vector Machines

Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

### σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

### COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-

### Variational Bayesian Logistic Regression

Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic

### Today. Calculus. Linear Regression. Lagrange Multipliers

Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

### COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

### Learning with kernels and SVM

Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find

### Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

### 10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

### Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

### Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

### Introduction to Machine Learning

1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

### Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

### Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

### Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

### UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

### MLE/MAP + Naïve Bayes

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

### Introduction to Machine Learning

Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

### 1 Bayesian Linear Regression (BLR)

Statistical Techniques in Robotics (STR, S15) Lecture#10 (Wednesday, February 11) Lecturer: Byron Boots Gaussian Properties, Bayesian Linear Regression 1 Bayesian Linear Regression (BLR) In linear regression,

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

### Logistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA

Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis

### Lecture 3: Pattern Classification. Pattern classification

EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

### Introduction to Machine Learning

Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

### CPSC 540: Machine Learning

CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

### y Xw 2 2 y Xw λ w 2 2

CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

### Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version