Linear Models for Regression

Size: px
Start display at page:

Download "Linear Models for Regression"

Transcription

1 Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1

2 The Regression Problem Training data: A set of input-output pairs: (x n, t n ) x n is the n-th training input. t n is the target output for x n. Goal: learn a function y(x), that can predict the target value t for a new input x. So far, this is the standard definition of a generic supervised learning problem. What differentiates regression problems is that the target outputs come from a continuous space. 2

3 A Regression Example circles: training data. red curve: one possible solution: a line. green curve: another possible solution: a cubic polynomial. 3

4 Linear Models for Regression M 1 y x, w = w 0 + w j φ j (x) j=1 Functions φ j are called basis functions. You must decide what these functions should be, before you start training. They can be any functions you want. Parameters w j are weights. They are real numbers. The goal of linear regression is to estimate these weights. The output of training is the values of these weights. 4

5 The Dummy φ 0 Function M 1 y x, w = w 0 + w j φ j (x) j=1 To simplify notation, we define a "dummy" basis function φ 0 x = 1. Then, the above formula becomes: M 1 y x, w = w j φ j (x) j=0 5

6 Dot Product Version M 1 y x, w = w j φ j x j=0 = w T φ x w 0 w 1 w is a column vector of weights: w = w M 1 w T is the transpose of w: w T = [w 0, w 1,, w M 1 ]. φ(x) is a column vector: φ(x) = φ 0 (x) φ 1 (x) φ M 1 (x) 6

7 Common Choices for Basis Functions Gaussian basis functions: φ j x = e (x μ j )2 2s 2 7

8 Common Choices for Basis Functions Gaussian basis functions: φ j x = e (x μ j )2 2s 2 8

9 Common Choices for Basis Functions Sigmoidal basis functions: φ j x = σ x μ j s σ a = e a σ a is called the logistic sigmoid function. We will see it again in neural networks. 9

10 Common Choices for Basis Functions Sigmoidal basis functions: φ j x = σ x μ j s σ a = e a σ a is called the logistic sigmoid function. We will see it again in neural networks. 10

11 Common Choices for Basis Functions Polynomial basis functions. For example: powers of x: φ j x = x j If the basis functions are powers of x, then the regression process fits a polynomial to the data. In other words, the regression process estimates the parameters w j of a polynomial of degree M-1: M 1 y x, w = w j x j j=0 11

12 Global Versus Local Functions Polynomial basis functions are global functions. Their values are far from zero for most of the input range from to +. Changing a single weight w j affects the output y(x, w) in the entire input range. Gaussian functions are local functions. Their values are practically (but not mathematically) zero for most of the input range from to +. Changing a single weight w j practically does not affect the output y(x, w) except in a specific small interval. It is often easier to fit data with local basis functions. Each basis function fits a small region of the input space. 12

13 Linear Versus onlinear Functions A linear function y(x, w) produces an output that depends linearly on both x and w. ote that polynomial, Gaussian, and sigmoidal basis functions are nonlinear. If we use nonlinear basis functions, then the regression process produces a function y(x, w) which is: Linear to w. onlinear to x. It is important to remember: linear regression can be used to estimate nonlinear functions of x. It is called linear regression because y is linear to w, OT because y is linear to x. 13

14 Solving Regression Problems There are different methods for solving regression problems. We will study two approaches: Least squares: find the weights w that minimize the squared error. Regularized least squares: find the weights w that minimize the squared error, using some hand-picked regularization parameter λ. 14

15 The Gaussian oise Assumption Suppose that we want to find the most likely solution. We want to find the weights w that maximize the likelihood of the training data. In order to do that, we need to make an additional assumption about the process that generates outputs based on inputs. A common approach is to assume a Gaussian noise model. t = y x, w + ε In words, t is generated by computing y(x, w) and then adding some noise. The noise ε is a random variable from a zero-mean Gaussian distribution. 15

16 The Gaussian oise Assumption t = y x, w + ε The noise ε is a random variable from a zero-mean Gaussian distribution. Therefore: p t x, w, β = t y(x, w, β 1 ) In the above equation, β is called the precision of the Gaussian, and is defined as the inverse of the variance: β = 1 σ 2 The likelihood that input x leads to output t is a Gaussian, whose mean is y(x, w), and its variance is β 1. 16

17 Finding the Most Likely Solution What is the value of w that is most likely given the data? Suppose we have a set of training inputs: X = x 1,, x We also have a set of corresponding outputs: t = t 1,, t We assume that training inputs are independent of each other. We assume that outputs are conditionally independent of each other, given their inputs and noise parameter β. Then: p w X, β, t = p t X, w, β p(w X, β) p t X, β 17

18 Finding the Most Likely Solution p w X, β, t = p t X, w, β p(w X, β) p t X, β We assume that, given X and β, all values of w are equally likely. Then, p(w X,β) is a constant that does not depend on w. p t X,β Therefore, finding the w that maximizes p w X, β, t is the same as finding the w that maximizes p t X, w, β. p t X, w, β is the likelihood of the training data. So, to find the most likely answer w, we must find the value of w that maximizes the likelihood of the training data. 18

19 Likelihood of the Training Data What is the probability of the training data given w? We assume that outputs are conditionally independent of each other, given their inputs and noise parameter β. Then: p t X, w, β = t n y(x n, w, β 1 ) 19

20 Likelihood of the Training Data Remember that, using dot product notation: y x n, w = w T φ x n Therefore: p t X, w, β = t n y(x n, w, β 1 ) p t X, w, β = t n w T φ(x n, β 1 ) Our goal is to find the weights w that maximize p t X, w, β. 20

21 Log Likelihood of the Training Data p t X, w, β = t n w T φ(x n, β 1 ) Maximizing p t X, w, β is the same as maximizing ln(p t X, w, β ), where ln is the natural logarithm. ln p t X, w, β = ln( t n w T φ(x n, β 1 )) 21

22 Log Likelihood of the Training Data ln p t X, w, β = ln( t n w T φ(x n, β 1 )) = ln = ln = ln 1 β β t n w T φ(x n ) 2 2π e 2 β 2π β 1 2π e + ln e β t n w T φ(x n ) 2 t n w T φ(x n ) 2 2β

23 Log Likelihood of the Training Data ln p t X, w, β = ln β 2π + ln e β t n w T φ(x n ) 2 2 = ln β 2π + β t n w T φ(x n ) 2 2 ote that ln β 2π is independent of w. Therefore, to maximize ln p t X, w, β we must maximize β t n w T φ(x n )

24 Log Likelihood and Sum-of-Squares Error To maximize ln p t X, w, β we must maximize: β t n w T φ(x n ) 2 2 Remember that the sum-of-squares error is defined in the textbook as: E D w = 1 t 2 n w T φ(x n ) 2 Therefore, we want to maximize βe D w, which is the same as saying that we want to minimize E D w. Therefore, we have proven that: the w that maximizes the likelihood of the training data is the same w that minimizes the sum-of-squares error. 24

25 Minimizing Sum-of-Squares Error We want to find the w that minimizes: E D w = 1 t 2 n w T φ(x n ) 2 E D w is a function mapping an M-dimensional vector to a real number. Remember from calculus: to minimize any such function: Compute the gradient vector E D w. This gradient is an M-dimensional row vector. Finding values of w that solve equation E D w = 0. These values of w can be possible maxima or minima. 25

26 Minimizing Sum-of-Squares Error We want to find the w that minimizes: E D w = 1 2 t n w T φ(x n ) 2 To calculate E d w : First, let's calculate the gradient E D,n w of function: E D,n w = t n w T φ(x n ) 2 E D,n w is the composition f g(w) of: f x = x 2 g w = t n w T φ x n According to the chain rule: f g = f g g. 26

27 Minimizing Sum-of-Squares Error E D,n w = t n w T φ(x n ) 2 E D,n w is the composition f g(w) of: f x = x 2 g w = t n w T φ x n According to the chain rule: f g = f g g f x g w = 2x = φ(x n ) T f g(w) g w = 2 t n w T φ x n ( φ x n T ) 27

28 Minimizing Sum-of-Squares Error E D,n w = t n w T φ(x n ) 2 E D,n w = 2 t n w T φ x n φ x n T E D w = 1 2 t n w T φ(x n ) 2 = 1 2 E D,n w Therefore: E D w = 1 2 = 1 2 E D,n w 2 t n w T φ x n φ x n T 28

29 Minimizing Sum-of-Squares Error E D w = t n w T φ x n φ x n T = w T φ x n t n φ x n T = w T φ x n φ x n T t n φ x n T 29

30 Minimizing Sum-of-Squares Error E D w = w T φ x n φ x n T t n φ x n T We want to solve equation E D w = 0. We can simplify expressions using vector and matrix notation: Φ = φ 0 x 1,, φ M 1 x 1 φ 0 x 2,, φ M 1 x 2 t = φ 0 x,, φ M 1 x t 1 t 2 t 30

31 Minimizing Sum-of-Squares Error E D w = w T φ x n φ x n T t n φ x n T We want to solve equation E D w = 0. We can simplify expressions using vector and matrix notation: Φ = φ 0 x 1,, φ M 1 x 1 φ 0 x 2,, φ M 1 x 2 t = φ 0 x,, φ M 1 x t 1 t 2 t 31

32 Minimizing Sum-of-Squares Error φ x n φ x n T = φ 0 x n φ M 1 x n φ 0 x n,, φ M 1 x n = φ 0 x n 2, φ 0 x n φ 1 x n,, φ 0 x n φ M 1 x n φ 0 x n φ 1 x n, φ 1 x n 2,, φ 1 x n φ M 1 x n φ 0 x n φ M 1 x n, φ 1 x n φ M 1 x n,, φ M 1 x n 2 = Φ T Φ 32

33 Minimizing Sum-of-Squares Error E D w = w T φ x n φ x n T t n φ x n T Φ = φ 0 x 1,, φ M 1 x 1 φ 0 x 2,, φ M 1 x 2 t = φ 0 x,, φ M 1 x t 1 t 2 t 33

34 Minimizing Sum-of-Squares Error E D w = w T Φ T Φ t n φ x n T Φ = φ 0 x 1,, φ M 1 x 1 φ 0 x 2,, φ M 1 x 2 t = φ 0 x,, φ M 1 x t 1 t 2 t 34

35 Minimizing Sum-of-Squares Error E D w = w T Φ T Φ t n φ x n T Φ = φ 0 x 1,, φ M 1 x 1 φ 0 x 2,, φ M 1 x 2 t = φ 0 x,, φ M 1 x t 1 t 2 t 35

36 Minimizing Sum-of-Squares Error E D w = w T Φ T Φ t T Φ Φ = φ 0 x 1,, φ M 1 x 1 φ 0 x 2,, φ M 1 x 2 t = φ 0 x,, φ M 1 x t 1 t 2 t 36

37 Minimizing Sum-of-Squares Error E D w = w T Φ T Φ t T Φ E D w = 0 w T Φ T Φ = t T Φ Multiplying both sides by (Φ T Φ) 1 w T = t T Φ (Φ T Φ) 1 37

38 Minimizing Sum-of-Squares Error E D w = w T Φ T Φ t T Φ E D w = 0 w T Φ T Φ = t T Φ Transposing both sides w T = w = t T Φ (Φ T Φ) 1 t T Φ (Φ T Φ) 1 T 38

39 Minimizing Sum-of-Squares Error E D w = w T Φ T Φ t T Φ E D w = 0 w T Φ T Φ = t T Φ Using rule: (AB) T = B T A T w T = w = t T Φ (Φ T Φ) 1 t T Φ (Φ T Φ) 1 T w = (Φ T Φ) 1 T Φ T t 39

40 Minimizing Sum-of-Squares Error E D w = w T Φ T Φ t T Φ E D w = 0 w T Φ T Φ = t T Φ Using rule: (A T ) 1 = (A 1 ) T w T = w = t T Φ (Φ T Φ) 1 t T Φ (Φ T Φ) 1 T w = (Φ T Φ) 1 T Φ T t w = (Φ T Φ) T 1 Φ T t 40

41 Minimizing Sum-of-Squares Error E D w = w T Φ T Φ t T Φ E D w = 0 w T Φ T Φ = t T Φ Using rule: (AB) T = B T A T w T = w = t T Φ (Φ T Φ) 1 t T Φ (Φ T Φ) 1 T w = (Φ T Φ) 1 T Φ T t w = (Φ T Φ) T 1 Φ T t w = (Φ T Φ) 1 Φ T t 41

42 otation: w ML From the previous slides, we got the formula w = (Φ T Φ) 1 Φ T t We denote this value of w as w ML, because it is the maximum likelihood estimate of w. In other words, w ML is the most likely value of w given the data. So, we rewrite the formula as: w ML = (Φ T Φ) 1 Φ T t 42

43 Solving for β Remember the Gaussian oise model: p t x, w, β = t y(x, w, β 1 ) In the above equation, β is the precision of the Gaussian, and is defined as the inverse of the variance: β = 1 σ 2 Given our estimate w ML, we can also estimate the precision β and the variance σ 2 : σ ML 2 = 1 β ML = 1 t n w ML T φ(x n ) 2 43

44 Least Squares Solution - Summary w = w 0 w 1 w M 1 t = t 1 t 2 t Φ = φ 0 x 1,, φ M 1 x 1 φ 0 x 2,, φ M 1 x 2 φ 0 x,, φ M 1 x Given the above notation: σ ML 2 = 1 β ML = 1 w ML = (Φ T Φ) 1 Φ T t t n w ML T φ(x n ) 2 44

45 umerical Issues Black circles: training data. Green curve: true function y(x)=sin(2πx). Red curve: Top figure: Fitted 7-degree polynomial. Bottom figure: Fitted 8-degree polynomial. Do you see anything strange? 45

46 umerical Issues The 8-degree polynomial fits the data worse than the 7- degree polynomial. Is this possible? 46

47 umerical Issues The 8-degree polynomial fits the data worse than the 7- degree polynomial. Mathematically, this is impossible. 8-degree polynomials are a superset of 7-degree polynomials. Thus, the best-fitting 8-degree polynomial cannot fit the data worse than the best-fitting 7- degree polynomial. 47

48 umerical Issues The 8-degree polynomial fits the data worse than the 7- degree polynomial. Mathematically, this is impossible. Cause of the problem: numerical issues. Computing the inverse of Φ T Φ may produce a result that is far from the correct one. 48

49 umerical Issues Work-around in Matlab: inv(phi' * phi) leads to the incorrect 8-degree polynomial fit below. pinv(phi' * phi) leads to the correct 8-degree polynomial fit above (almost identical to the 7-degree result). 49

50 Sequential Learning The w ML estimate was obtained by processing the training data as a batch. We use all the data at once to estimate w ML. However, batch processing can be computationally expensive for large datasets. It involves matrix multiplications of Mx matrices. An alternative is sequential learning. This is also called online learning. In this scenario: We first, somehow, get an initial estimate w (0). Then, we observe training examples, one by one. Every time we observe a new training example, we update the estimate. 50

51 Sequential Learning We first, somehow, get an initial estimate w (0). That can be obtained, for example, by computing w ML using the first few training examples as a batch. Or, we initialize w to a random value. Then, we observe training examples, one by one. When we observe the n th training example, we update the estimate from w (τ) to w (τ+1). Remember that the n th training example contributes to the overall error E D w a term E D,n w defined as: E D,n w = t n w T φ(x n ) 2 51

52 Sequential Learning The n th training example contributes to the overall error E D w a term E D,n w defined as: E D,n w = t n w T φ(x n ) 2 When we observe the n th training example, we update the estimate from w (τ) to w (τ+1) as follows: w (τ+1) = w (τ) η E D,n (w) = w (τ) +η(t n w τ T φ n )φ n η is called the learning rate. It is picked manually. This whole process is called stochastic gradient descent. 52

53 Sequential Learning - Intuition When we observe the n th training example, we update the estimate from w (τ) to w (τ+1) as follows: w (τ+1) = w (τ) η E D,n (w) = w (τ) +η(t n w τ T φ n )φ n What is the intuition behind this update? The gradient E D,n (w) is a vector that points in the direction where E D,n (w) increases. Therefore, subtracting a very small amount of E D,n w from w should make E D,n (w) a little bit smaller. 53

54 Sequential Learning - Intuition When we observe the n th training example, we update the estimate from w (τ) to w (τ+1) as follows: w (τ+1) = w (τ) η E D,n (w) = w (τ) +η(t n w τ T φ n )φ n Choosing a good value for η is important. If η is too small, the minimization may happen too slowly and require too many training examples. If η is large, the update may overfit the most recent training example, and overall w may fluctuate too much from one update to the next. Unfortunately, picking a good η is more of an art than a science, and involves trial-and-error. 54

55 Regularized Least Squares We estimated w ML by minimizing the sum-of-squares error E D w : E D w = 1 2 t n w T φ(x n ) 2 We have seen in Chapter 1 the regularized version of this error: E D w = 1 2 t n w T φ(x n ) 2 + λ 2 wt w 55

56 Solving Regularized Least Squares E D w = 1 2 t n w T φ(x n ) 2 + λ 2 wt w The value of w that minimizes E D w is: w = (λi + Φ T Φ) 1 Φ T t In the above equation, I is the MxM identity matrix. 56

57 Why Use Regularization? E D w = 1 2 t n w T φ(x n ) 2 + λ 2 wt w We use regularization when we know, in advance, that some solutions should be preferred over other solutions. Then, we add to the error function a term that penalizes undesirable solutions. In the linear regression case: people know (from past experience) that high values of weights are indicative of overfitting. So, for each high value w i we add to the error a penalty (w i ) 2. 57

58 Impact of Parameter λ 58

59 Impact of Parameter λ 59

60 Impact of Parameter λ As the previous figures show: Low values of λ lead to polynomials whose values fluctuate more and more rapidly. This can lead to increased overfitting. High values of λ lead to flatter and flatter polynomials, that look more and more like straight lines. This can lead to increased underfitting, or not fitting the data sufficiently. 60

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur srihari@cedar.buffalo.edu Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Today. Calculus. Linear Regression. Lagrange Multipliers

Today. Calculus. Linear Regression. Lagrange Multipliers Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE

LECTURE NOTE #NEW 6 PROF. ALAN YUILLE LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Ways to make neural networks generalize better

Ways to make neural networks generalize better Ways to make neural networks generalize better Seminar in Deep Learning University of Tartu 04 / 10 / 2014 Pihel Saatmann Topics Overview of ways to improve generalization Limiting the size of the weights

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Learning from Data: Regression

Learning from Data: Regression November 3, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Classification or Regression? Classification: want to learn a discrete target variable. Regression: want to learn a continuous target variable. Linear

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

Gradient Descent. Sargur Srihari

Gradient Descent. Sargur Srihari Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors

More information

Error Functions & Linear Regression (1)

Error Functions & Linear Regression (1) Error Functions & Linear Regression (1) John Kelleher & Brian Mac Namee Machine Learning @ DIT Overview 1 Introduction Overview 2 Univariate Linear Regression Linear Regression Analytical Solution Gradient

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as:

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as: CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017 1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam. CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators

More information

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module 2 Lecture 05 Linear Regression Good morning, welcome

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Support Vector Machines

Support Vector Machines Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

Hidden Markov Models Part 1: Introduction

Hidden Markov Models Part 1: Introduction Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

CENG 793. On Machine Learning and Optimization. Sinan Kalkan CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Linear models for regression Regularized

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Gradient Descent Training Rule: The Details

Gradient Descent Training Rule: The Details Gradient Descent Training Rule: The Details 1 For Perceptrons The whole idea behind gradient descent is to gradually, but consistently, decrease the output error by adjusting the weights. The trick is

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information