An Introduction to Statistical and Probabilistic Linear Models

Size: px
Start display at page:

Download "An Introduction to Statistical and Probabilistic Linear Models"

Transcription

1 An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017

2 Introduction In statistical learning theory, linear models are used for regression and classification tasks. 2

3 Introduction In statistical learning theory, linear models are used for regression and classification tasks. What is regression? 2

4 Introduction In statistical learning theory, linear models are used for regression and classification tasks. What is regression? What is classification? 2

5 Introduction In statistical learning theory, linear models are used for regression and classification tasks. What is regression? What is classification? How can we model such concepts in a mathematical context? 2

6 Linear regression

7 Linear regression - basics What is regression? 4

8 Linear regression - basics What is regression? Approximation of data using a (closed) mathematical expression 4

9 Linear regression - basics What is regression? Approximation of data using a (closed) mathematical expression Achieved by estimating the model parameters that maximize the approximation 4

10 Linear regression - example A company changes the price of their products for the nth time. 5

11 Linear regression - example A company changes the price of their products for the nth time. They know how the price changes affected the consumer behavior n 1 times before. 5

12 Linear regression - example A company changes the price of their products for the nth time. They know how the price changes affected the consumer behavior n 1 times before. Using linear regression, they can predict the consumer behavior for the nth price change. 5

13 Linear regression - basics Let D denote a set of n-dimensional data vectors 6

14 Linear regression - basics Let D denote a set of n-dimensional data vectors Let x be an n-dimensional observation 6

15 Linear regression - basics Let D denote a set of n-dimensional data vectors Let x be an n-dimensional observation How can we approximate x? 6

16 Linear regression - basics Create a linear function y(x, w) = w 0 + w 1 x w n x n that approximates x with w. 7

17 Linear regression - basics Example for n = 2. y(x, w) = w 0 + w 1 x 1 + w 2 x 2 8

18 Linear regression - basics Problem Weight parameters w i are simply values. 9

19 Linear regression - basics Problem Weight parameters w i are simply values. = significant limitation! 9

20 Linear regression - basics Problem Weight parameters w i are simply values. = significant limitation! Idea Use weighted non-linear functions φ j instead! y(x, w) = w 0 + n w j φ j (x) = w φ(x), j=1 where φ = (φ 0,..., φ n ). 9

21 Polynomial regression Example Let φ i (x) = x i. 10

22 Polynomial regression Example Let φ i (x) = x i. y(x, w) = w 0 + n w j x j j=1 = w 0 + w 1 x + w 2 x w n x n 10

23 Polynomial regression Approximation with a 2nd-order polynomial. 11

24 Polynomial regression Approximation with a 6th-order polynomial. 11

25 Polynomial regression Approximation with an 8th-order polynomial. 11

26 Polynomial regression Problem 12

27 Polynomial regression Problem = Overfitting with higher polynomial degree 12

28 Polynomial regression 13

29 Polynomial regression 13

30 Linear classification

31 Linear classification - basics What is classification? 15

32 Linear classification - basics What is classification? Aims to partition the data into predefined classes 15

33 Linear classification - basics What is classification? Aims to partition the data into predefined classes A class contains observations with similar characteristics 15

34 Linear classification - example We have n different cucumbers and courgettes. 16

35 Linear classification - example We have n different cucumbers and courgettes. Each record contains the weight and the texture (smooth, rough). 16

36 Linear classification - example We have n different cucumbers and courgettes. Each record contains the weight and the texture (smooth, rough). We want to predict the correct label without knowing the real label. 16

37 Linear classification - basics Assume a two-class classification problem 17

38 Linear classification - basics Assume a two-class classification problem How can we categorise the data into predefined classes? 17

39 Linear classification - basics Assume a two-class classification problem How can we categorise the data into predefined classes? 17

40 Linear classification - basics Assume a two-class classification problem How can we categorise the data into predefined classes? 17

41 Linear classification - basics Assume a two-class classification problem How can we categorise the data into predefined classes? 17

42 Linear classification - basics Cucumbers vs. courgettes 18

43 Linear classification - basics Discriminant functions Given a dataset D, 19

44 Linear classification - basics Discriminant functions Given a dataset D, we aim to categorise x into either class C 1 or C 2. 19

45 Linear classification - basics Discriminant functions Given a dataset D, we aim to categorise x into either class C 1 or C 2. Use y(x) such that x C 1 y(x) 0 and x C 2 otherwise 19

46 Linear classification - basics Discriminant functions Given a dataset D, we aim to categorise x into either class C 1 or C 2. Use y(x) such that with x C 1 y(x) 0 and x C 2 otherwise y(x) = w x + w 0. 19

47 Linear classification - basics Decision boundary H: H := {x D : y(x) = 0} 20

48 Linear classification - basics Decision boundary H: H := {x D : y(x) = 0} 20

49 Non-linearity However, it often occurs that the data are not linearly-separable: 21

50 Non-linearity However, it often occurs that the data are not linearly-separable: 21

51 Non-linearity However, it often occurs that the data are not linearly-separable: 21

52 Non-linearity Solution Use non-linear functions instead! 22

53 Non-linearity Solution Use non-linear functions instead! = y(x) = w φ(x) φ(x) = (φ 0, φ 1,..., φ M 1 ) 22

54 Common classification algorithms Other commonly-used classification algorithms: 23

55 Common classification algorithms Other commonly-used classification algorithms: Naive Bayes classifier 23

56 Common classification algorithms Other commonly-used classification algorithms: Naive Bayes classifier Logistic regression (Cox, 1958) 23

57 Common classification algorithms Other commonly-used classification algorithms: Naive Bayes classifier Logistic regression (Cox, 1958) Support vector machines (Vapnik and Lerner, 1963) 23

58 Summary 24

59 Summary Linear models Linear combinations of weighted (non-)linear functions 24

60 Summary Linear models Linear combinations of weighted (non-)linear functions Regression Approximation of a given amount of data using a closed mathematical representation 24

61 Summary Linear models Linear combinations of weighted (non-)linear functions Regression Approximation of a given amount of data using a closed mathematical representation Classification Categorisation of data according to individual characteristics and common patterns 24

62 Further readings Further readings: J. Aldrich, R.A. Fisher and the Making of Maximum Likelihood Statistical Science Vol. 12, No. 3, D. Barber, Bayesian Reasoning and Machine Learning. Cambridge University Press C. M. Bishop, Pattern Recognition and Machine Learning. Springer T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Second edition. Springer K. Murphy, Machine Learning - A Probabilistic Perspective. The MIT Press, Cambridge, Massachusetts, London, England A. Y. Ng and M. I. Jordan, On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive bayes. Advances in Neural Information Processing Systems. 25

63 References References: J. Aldrich, R.A. Fisher and the Making of Maximum Likelihood Statistical Science Vol. 12, No. 3, D. Barber, Bayesian Reasoning and Machine Learning. Cambridge University Press C. M. Bishop, Pattern Recognition and Machine Learning. Springer D. R. Cox, The regression analysis of binary sequences. Journal of the Royal Statistical Society Vol. XX, No

64 References References: T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Second edition. Springer K. Murphy, Machine Learning - A Probabilistic Perspective. The MIT Press, Cambridge, Massachusetts, London, England F. Rosenblatt, The Perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review Vol. 65, No

65 Thank you for your attention. Questions?

66 Backup slides

67 Sum of least squares

68 Parameter estimation Estimating the weight parameters How to choose the w i? 30

69 Parameter estimation Estimating the weight parameters How to choose the w i? = Find the set of parameters that maximize p(d w) 30

70 Sum of least squares Method to optimize the weights w 31

71 Sum of least squares Method to optimize the weights w Aims to minimize the residual sum of squares (RSS) by optimizing weights 31

72 Sum of least squares Method to optimize the weights w Aims to minimize the residual sum of squares (RSS) by optimizing weights Minimize RSS(w) = = N (z i y(x i, w)) 2 i=1 N z i w 0 i=1 M 1 j=1 x ij w j 2 31

73 Sum of least squares RSS can be simplified by using an N M matrix X with the x i as rows 32

74 Sum of least squares RSS can be simplified by using an N M matrix X with the x i as rows Then RSS(w) = (z Xw) (z Xw), where z is a vector of target values 32

75 Sum of least squares Building the derivatives leads to RSS w = 2X (z Xw) 2 RSS w w = 2X X 33

76 Sum of least squares Building the derivatives leads to RSS w = 2X (z Xw) 2 RSS w w = 2X X Setting the first derivative to zero and solving for w results in ŵ = (X X) 1 X z 33

77 Sum of least squares Note: the approach assumes X X to be positive definite 34

78 Sum of least squares Note: the approach assumes X X to be positive definite Therefore, X is assumed to have full column rank 34

79 Sum of least squares Approximation function can be described as z i = y(x i, w) + ɛ, where ɛ represents the data noise 35

80 Sum of least squares where φ = (φ 0,..., φ M 1 ) 35 Approximation function can be described as z i = y(x i, w) + ɛ, where ɛ represents the data noise RSS provides a measurement for the prediction error E D defined as E D (w) = 1 2 N (z i w φ(x i )) 2, i=1

81 Maximum Likelihood Estimation

82 Maximum Likelihood Estimation Maximum Likelihood Estimation (MLE) 37

83 Maximum Likelihood Estimation Maximum Likelihood Estimation (MLE) Introduced by Fisher (1922) 37

84 Maximum Likelihood Estimation Maximum Likelihood Estimation (MLE) Introduced by Fisher (1922) Commonly-used method to optimize the model parameters 37

85 Maximum Likelihood Estimation Goal Find the optimal parameters w such that p(d w) is maximized, i.e. ŵ arg max p(d w). w 38

86 Maximum Likelihood Estimation Goal Find the optimal parameters w such that p(d w) is maximized, i.e. ŵ arg max p(d w). w For target values z i, p(z i x i, w) are assumed to be independent such that p(d w) = N p(z i x i, w) i=1 holds for all z i D. 38

87 Maximum Likelihood Estimation The product makes the equation unwieldy. Let s simplify it using the log! 39

88 Maximum Likelihood Estimation The product makes the equation unwieldy. Let s simplify it using the log! log p(d w) = N log p(z i x i, w) i=1 39

89 Maximum Likelihood Estimation The product makes the equation unwieldy. Let s simplify it using the log! log p(d w) = N log p(z i x i, w) i=1 We can now compute log p(d w) = 0 and solve for w. = w ŵ are the optimal weight parameters! 39

90 Maximum Likelihood Estimation Assumption: data noise ɛ follows Gaussian distribution 40

91 Maximum Likelihood Estimation Assumption: data noise ɛ follows Gaussian distribution Then p(d w) can be described as p(z x, w, β) = N (z y(x, w), β 1 ), where β is the model s inverse variance 40

92 Maximum Likelihood Estimation For z we get log p(z X, w, β) = N log N (z i w φ(x i ), β 1 ) i=1 41

93 Maximum Likelihood Estimation It follows that log p(z X, w, β) = β N (z i w φ(x i )) φ(x i ) i=1 42

94 Maximum Likelihood Estimation It follows that log p(z X, w, β) = β N (z i w φ(x i )) φ(x i ) i=1 Setting the gradient to zero and solving for w leads to w ML = (Φ Φ) 1 Φ z 42

95 Maximum Likelihood Estimation Φ is called design matrix and is described as φ 0 (x 1 ) φ 1 (x 1 )... φ M 1 (x 1 ) Φ = φ 0 (x 2 ) φ 1 (x 2 )... φ M 1 (x 2 ) φ 0 (x N ) φ 1 (x N )... φ M 1 (x N ) 43

96 Regularization and ridge regression

97 Ridge regression Method to prevent overfitting 45

98 Ridge regression Method to prevent overfitting Add regularization term compensating the biased prediction 45

99 Ridge regression Regularization term λ w

100 Ridge regression Regularization term λ w 2 2 Error function E(w) becomes N E(w) = 1 2 i=1 (z i w φ(x i )) 2 + λ w

101 Ridge regression Regularization term λ w 2 2 Error function E(w) becomes N E(w) = 1 2 i=1 (z i w φ(x i )) 2 + λ w 2 2 Minimizing E(w) and solving for w results in ŵ ridge = (λi + Φ Φ) 1 Φ z 46

102 Non-linear classification

103 Non-linearity In our example 48

104 Non-linearity In our example the appropriate decision boundary is φ : (x 1, x 2 ) (r cos(x 1 ), r sin(x 2 )), r R. 48

105 Multi-class classification

106 Multi-class classification The discriminant function for two-class classification can be extended to a k-class problem 50

107 Multi-class classification The discriminant function for two-class classification can be extended to a k-class problem Use k-class discriminator y k (x) = w k x + w k0 50

108 Multi-class classification The discriminant function for two-class classification can be extended to a k-class problem Use k-class discriminator y k (x) = w k x + w k0 x is assigned to class C j if y j (x) > y i (x) for all i j 50

109 Multi-class classification The decision boundary is then y j (x) = y i (x) which can be transformed to (w j w i ) x + w j0 w i0 = 0 51

110 Multi-class classification 52

111 Perceptron algorithm

112 Perceptron Given an input vector x and a fixed non-linear function φ(x), the class of x is estimated by y(x) = f(w φ(x)), where f(t) is called non-linear activation function. 54

113 Perceptron Given an input vector x and a fixed non-linear function φ(x), the class of x is estimated by y(x) = f(w φ(x)), where f(t) is called non-linear activation function. { +1 if t 0, f(t) = 1 otherwise. 54

114 Probabilistic generative models

115 Probabilistic generative models Compute probability p(x, z) directly instead of optimizing the weight parameters 56

116 Probabilistic generative models Compute probability p(x, z) directly instead of optimizing the weight parameters Apply Bayes theorem on p(z x) 56

117 Bayes theorem For a set of disjunct samples A 1,..., A n and a given sample B the probability p(a i B), i {1,..., n} can be computed as follows: p(a i B) = p(b A i ) p(a i ) n j=1 p(b A j) p(a j ) 57

118 Probabilistic generative models Consider a two-class classification problem for C 1 and C 2 58

119 Probabilistic generative models Consider a two-class classification problem for C 1 and C 2 Then where a = log p(x C 1 )p(c 1 ) p(c 1 x) = p(x C 1 )p(c 1 ) + p(x C 2 )p(c 2 ) 1 = 1 + e a = σ(a), ( ) p(x C1 )p(c 1 ) p(x C 2 )p(c 2 ) and σ(a) = e a 58

120 Probabilistic discriminative models

121 Probabilistic discriminative models Predict the correct class by directly computing the posterior probability p(z x) 60

122 Probabilistic discriminative models Predict the correct class by directly computing the posterior probability p(z x) This makes the computation of p(x z) (Bayes theorem) redundant 60

123 Probabilistic discriminative models Computing the posterior probability p(c k x, θ opt C k x ) is then achieved by using MLE 61

124 Probabilistic discriminative models Computing the posterior probability p(c k x, θ opt C k x ) is then achieved by using MLE Disadvantage: only little knowledge about the given data is required ( black-box ) 61

125 Logistic regression

126 Logistic regression Commonly-used classification algorithm for binary classification problems 63

127 Logistic regression Commonly-used classification algorithm for binary classification problems Assumption: data noise ɛ follows Bernoulli distribution Ber(n) = p n (1 p) 1 n, n {0, 1} as this distribution is more appropriate for a binary classification problem 63

128 Logistic regression Predict the correct class label on the probability p(c k x, w) = Ber(C k σ(w x)), where σ is a squashing function (e.g. sigmoid) 64

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Linear Classification: Probabilistic Generative Models

Linear Classification: Probabilistic Generative Models Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met c Outlines Statistical Group and College of Engineering and Computer Science Overview Linear Regression Linear Classification Neural Networks Kernel Methods and SVM Mixture Models and EM Resources More

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Linear Models for Classification

Linear Models for Classification Catherine Lee Anderson figures courtesy of Christopher M. Bishop Department of Computer Science University of Nebraska at Lincoln CSCE 970: Pattern Recognition and Machine Learning Congradulations!!!!

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 9, 2018 Content 1 2 3 4 Outline 1 2 3 4 problems { C 1, y(x) threshold predict(x) = C 2, y(x) < threshold, with threshold

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Slides modified from: PATTERN RECOGNITION CHRISTOPHER M. BISHOP. and: Computer vision: models, learning and inference Simon J.D.

Slides modified from: PATTERN RECOGNITION CHRISTOPHER M. BISHOP. and: Computer vision: models, learning and inference Simon J.D. Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP and: Computer vision: models, learning and inference. 2011 Simon J.D. Prince ClassificaLon Example: Gender ClassificaLon

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Machine Learning Gaussian Naïve Bayes Big Picture

Machine Learning Gaussian Naïve Bayes Big Picture Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 27, 2011 Today: Naïve Bayes Big Picture Logistic regression Gradient ascent Generative discriminative

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012 Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 20, 2012 Today: Logistic regression Generative/Discriminative classifiers Readings: (see class website)

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive: Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information