Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology
|
|
- Alaina Paul
- 5 years ago
- Views:
Transcription
1 Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
2 Outline I 1 Expected value 2 Linear regression wrap up 3 Bias, variance, noise Average function Expected loss 4 Ridge regression Keeping model in check (smart way) L2 penalty regularization L1 penalty lasso Gradient for L1 regularization Coordinace gradient descent 5 Choosing lambda Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
3 Presentation Outline 1 Expected value 2 Linear regression wrap up 3 Bias, variance, noise 4 Ridge regression 5 Choosing lambda Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
4 Expected value What is expected value In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represe. For instance expected value of dice roll is 3.5. Why? For discrete random variables inf E[X ] = x i p(x i ). i For equally probable outcomes, it is just an average For continuous case: E[X ] = whwre p(x) is probability density function. Note, that ususally E[XY ] E[X ] E[Y ] x p(x) dx, Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
5 Expected value What is expected value In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represe. For instance expected value of dice roll is 3.5. Why? Properties (some): E[c] = c E[E[X ]] = E[X ] E[X + c] = E[X ] + c E[X + Y ] = E[X ] + E[Y ] E[aX ] = a E[X ] Note, that ususally E[XY ] E[X ] E[Y ] Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
6 Presentation Outline 1 Expected value 2 Linear regression wrap up 3 Bias, variance, noise 4 Ridge regression 5 Choosing lambda Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
7 Problem formulation t Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
8 Hypotetical, ideal function and noise t f(x) Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
9 But which is ideal t f(x) f(x) f(x) Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
10 But which is ideal t f(x) f(x) f(x) Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
11 Gaussian interpretation t f(x) Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
12 t Gaussian interpretation Prediction is linear function and noise: y = f (x) + ɛ We assume that the noise ɛ is drawn from normal distribution: N (y µ, σ 2 ) = 1 (y µ)2 e 2σ 2 2σ2 π f(x) X We assume that training samples are i.i.d. We want to learn P(y θ, x, σ 2 1 ) = (y h 2σ2 π e θ (x)) 2σ 2 2 x The best P is when probability is max for every training set: So what is µ? arg max P(D θ, σ 2 N 1 ) = θ 2σ2 π N j e (y j h θ (x)) 2 2σ 2 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
13 Expected value t f * (x) = E Y [y x] Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
14 Expected value t f(x) = E D E Y [y x] Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
15 Presentation Outline 1 Expected value 2 Linear regression wrap up 3 Bias, variance, noise Average function Expected loss 4 Ridge regression 5 Choosing lambda Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
16 Average of complex function t Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
17 Average of complex function t Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
18 Average of complex function t Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
19 Average of complex function t Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
20 Average of complex function t f(x) = E D E Y [y x] Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
21 Average of complex function t Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
22 Average of simple function t Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
23 Average of simple function t Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
24 Average of simple function t Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
25 Average of simple function t f(x) = E D E Y [y x] Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
26 Average of simple function t f(x) = E D E Y [y x] Szymon Bobek (AGH-UST) Machine Learning 21 March / 62 x
27 t t Bias and variance x x Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
28 t t Bias and variance x x High bias the difference from the average to the ideal is large Low variance the difference between particular models and their average is low Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
29 t t Bias and variance x x High bias the difference from the average to the ideal is large Low variance the difference between particular models and their average is low Low bias the difference from the average to the ideal is low High variance the difference between particular models and their average is large Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
30 Outline 1 Expected value 2 Linear regression wrap up 3 Bias, variance, noise Average function Expected loss 4 Ridge regression Keeping model in check (smart way) L2 penalty regularization L1 penalty lasso Gradient for L1 regularization Coordinace gradient descent 5 Choosing lambda Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
31 Loss function Linear regression objective: maximize log likelihood or minimize RSS: J(θ) = N {y(x i ) h θ (x i )} 2 i We also know, that even the ideal function f (x) = y(x) + ɛ we can t do any better than approach f which still will be wrong For the perfect case E D [h θ (x; D)] = f (x), but usually not (see constant function) So we can say, tha our loss that we have impact on can be expressed as difference between ideal and ours, plus variance of data: L(y, h θ (x)) = E X [{h θ (x) f (x)} 2 ] + σ 2 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
32 Expected loss In general: E[L] = E D E X [L(y, h θ (x, D))] But for our particular case: E[L] = E X E D [{h θ (x; D) f (x)} 2 ] + σ 2 Let us focus on the big picture E D [{h θ (x; D) f (x)} 2 ] = Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
33 t t Bias, variance, noise decomposition E[L] =E D [{h θ (x; D) E D (h θ (x; D)} 2 ]+ + {E D [h θ (x; D)] f (x)} 2 + +σ 2 x x Bias average vs. ideal Variance average vs. its Szymon Bobek (AGH-UST) components Machine Learning 21 March / 62
34 How can we reduce the loss Problems P1 Noise: E D E X [{f (x) y(x; D)} 2 ] ideal function (unknown) vs. data (given). Not much we can do. P2 Bias: {E D [h θ (x; D)] f (x)} 2 how close we are to the ideal function wrt. model type (i.e. linear, quadratic, polynomial) P3 Variance: E D [{h θ (x; D) E D (h θ (x; D)} 2 ] how sensitive we are to the training data, how robust the algorithm is, if we get different dataset will the moedl be similar? Solutions P1 - No solution :) P2 - If you are far away from the ideal change the model to more complex, more data will help, but not much P3 - If you are very sensitive to data change the model to simpler, or get more data (average of complex models is close to ideal) Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
35 Bias, variance and error Error Error True error True error Test error Test error Number of training examples Number of training examples Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
36 Bias, variance and error Error Error Variance True error True error Train error Bias Train error Number of training examples Number of training examples Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
37 OK, so let us get the most complex model ever Error True error Train error Number of training examples Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
38 OK, so let us get the most complex model ever Error True error Train error Number of training examples Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
39 Bias, variance, model complexity Underfitting High Bias Low Variance Overfitting High Variance Low Bias Prediction Error Lowest Generalization Error Testing Error (Private LB) Validation Error (Local CV) Training Error Low Model Complexity High Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
40 Bias, variance, model complexity Underfitting High Bias Low Variance Overfitting High Variance Low Bias Prediction Error Lowest Generalization Error Testing Error (Private LB) Validation Error (Local CV) Training Error Low Model Complexity High Hmmm... Can we make the lagorithm to find the balance automatically? Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
41 Presentation Outline 1 Expected value 2 Linear regression wrap up 3 Bias, variance, noise 4 Ridge regression Keeping model in check (smart way) L2 penalty regularization L1 penalty lasso Gradient for L1 regularization Coordinace gradient descent 5 Choosing lambda Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
42 Expected loss revisited Problems P1 Noise: E D E X [{f (x) y(x; D)} 2 ] ideal function (unknown) vs. data (given). Not much we can do. P2 Bias: {E D [h θ (x; D)] f (x)} 2 how close we are to the ideal function wrt. model type (i.e. linear, quadratic, polynomial) P3 Variance: E D [{h θ (x; D) E D (h θ (x; D)} 2 ] how sensitive we are to the training data, how robust the algorithm is, if we get different dataset will the moedl be similar? Perfect situation Let us have very complex model (to have a lot of flexibility in reducing bias), and let the algorithm keep the variance low. How to keep the variance low? Well... by reducing the space of possible models :) Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
43 Reducing search space for the model 1 t t t x 1 0 x 1 1 t t t x 1 0 x 1 How to do this? Naive way: all subsets and gready algorithm (reduce search space in feature set). Do not let coefficients θ to grow too much (L2 penalty reduce search space in coefficients values) Try to figure out which coefficients θ are not usefull and set them to 0 (L1 penalty smart naive way) 0 x 1 0 x 1 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
44 Naive approach All subsets Start with no features and measure the J(θ) Search over all features and pick the one that has the lowest J Search two best features in the set of features that has teh lowes J Contiue, untill no significant improvement in J Forward/backward stepwise Start with no features and measure the J(θ) Search over all features and pick the one that has the lowest J Keep the previously best feature, adn select second best contiue, untill no significant improvement in J Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
45 All subsets, forward/backward stepwise RSS #bedrooms sq. meters #showers #floors year # of features Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
46 All subsets, forward/backward stepwise RSS #bedrooms sq. meters #showers #floors year # of features Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
47 Why not going brute force Question We have a very simple problem (15 features possible). We use linear regression and want to select features with all-subsets approach. How many models do we have to evaluate? Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
48 Why not going brute force Question We have a very simple problem (15 features possible). We use linear regression and want to select features with forwar stepwise approach. How many models do we have to evaluate? Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
49 How to measure L1, L2 penalties Sum of coefficients... no :) (L1) Sum of absolute values of coefficients L1 = θ 1 + θ θ n (L2) Sum of squered values of coefficients (L3, L4) Is there something like that? How to minimize it? L2 = θ θ θ 2 n Add it to cost function! But... will the cost function be still convex? How will I calculate gradient for such cost function? Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
50 Linear regression with L2 penalty Linear regression function h θ (x) = N θ i x i = Θx i theta 2 Cost function J(θ) = E x [(h θ (x) y(x)) 2 ] = 1 2N N (h θ (x (i) ) y (i) ) 2 i theta theta1 Cost function with regularization J(θ) = 1 2N N i (h θ (x (i) ) y (i) ) 2 + λ 2N N i=2 θ 2 i Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
51 Gradient for regularization regression For j = 0: theta 2 J(θ) θ 0 For j 1: = 1 N N (h θ (x (i) ) y (i) )x (i) j i=1 theta J(θ) θ j = 1 N N (h θ (x (i) ) y (i) )x (i) j + λ N θ j i=1 theta1 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
52 Countour plot White circles represent path of gradient descent for MSE only Yellow tringles represent gradient descent path for L2 penalty only for λ inf White circles represents the gradiet descent for combined cost function MSE + L2 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
53 Coefficients path Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
54 Why not set small coefficients to 0? Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
55 Why not set small coefficients to 0? Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
56 Why not set small coefficients to 0? Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
57 Why not set small coefficients to 0? Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
58 Linear regression with L1 penalty Linear regression function N h θ (x) = θ i x i = Θx i theta 2 Cost function J(θ) = E x [(h θ (x) y(x)) 2 ] = 1 2N N (h θ (x (i) ) y (i) ) 2 i theta theta 1 Cost function with L1 penalty J(θ) = 1 2N N i (h θ (x (i) ) y (i) ) 2 + λ 2N N θ i i=2 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
59 Countour plot White circles represent path of gradient descent for MSE only Yellow tringles represent gradient descent path for L1 penalty only for λ inf White circles represents the gradiet descent for combined cost function MSE + L1 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
60 (Sub)Gradient for L1 penalty Cost function with L1 penalty J(θ) = 1 2N N Let us focus on gradient for single θ j : J(θ) θ k = 1 N i = 1 N (h θ (x (i) ) y (i) ) 2 + λ 2N N i (h θ (x (i) ) y (i) )x (i) j [ N D θ k (x (i) k ) y (i) )x (i) j i k ] N θ i i=2 + λ θ 2N θ j + λ θ 2N θ j Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
61 Solution: subgradients f(x) g(x) = f(x 0 ) + v(x - x 0 ) f(x) g(x) = f(x 0 ) + c(x - x 0 ) f(x 0 ) x 0 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
62 Solution: subgradients f(x) f(x 0 ) + v(x x 0 ) V = [-1; 1] f(x 0 ) x 0 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
63 Solution: subgradient f(x) f(x 0 ) + v(x x 0 ) V = [-1; 1] x j (i) = N x (i) j i (x (i) j ) 2 f(x 0 ) J(θ) θ j J(θ) θ j = 1 N i x 0 = 1 N k [ N D (θ k x (i) k i k [ N D (θ k x (i) k y (i) ) x (i) j y (i) ) x (i) j ] + λ θ 2N θ j ] λ 2N if θ j < 0 + [ λ 2N ; λ 2N ] if θ j = 0 λ 2N if θ j > 0 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
64 Another trick = 1 N = 1 N J(θ) = 1 θ j N = 1 N D N i i k j k j N D (θ k x (i) k N D (θ k x (i) k i k j [ N D (θ k x (i) k i k (θ k x (i) k + θ j x (i) j y (i) ) x (i) j + 1 N y (i) ) x (i) j } {{ } ρ j ] + λ 2N y (i) ) x (i) j + λ 2N y (i) ) x (i) j +θ j 1 N N [ i N [ i θ j x (i) j ( x (i) j θ θ j = θ θ j = ] x (i) j + λ θ = 2N θ j ) 2] + λ 2N θ θ j = Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
65 Sum up J(θ) θ j = 1 N D θ k ( x (i) k N ) y (i) ) x (i) j i k j }{{} ρ j +θ j 1 N N [ i ( x (i) j ) 2] + λ 2N = 1 N ρ j + 1 λ 2N if θ j < 0 N θ j + [ λ 2N ; λ 2N ] if θ j = 0 = λ 2N if θ j > 0 λ 2N + 1 N ρ j + 1 N θ j if θ j < 0 = [ λ 2N + 1 N ρ j + 1 N θ j; λ 2N + 1 N ρ j + 1 N θ j] if θ j = 0 λ 2N + 1 N ρ j + 1 N θ j if θ j > 0 θ θ j = Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
66 Optimal solution = set gradient to zero J(θ) λ 2N + 1 N ρ j + 1 N θ j if θ j < 0 = [ λ θ 2N j + 1 N ρ j + 1 N θ j; λ 2N + 1 N ρ j + 1 N θ j] if θ j = 0 = 0 λ 2N + 1 N ρ j + 1 N θ j if θ j > 0 λ 2N + 1 N ρ j + 1 N θ j = 0 if θ j < 0 [ λ 2N + 1 N ρ j + 1 N θ j; λ 2N + 1 N ρ j + 1 N θ j] has to contain 0 if θ j = 0 λ 2N + 1 N ρ j + 1 N θ j = 0 if θ j > 0 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
67 Optimal solution = set gradient to zero λ 2N + 1 N ρ j + 1 N θ j = 0 if θ j < 0 [ λ 2N + 1 N ρ j + 1 N θ j; λ 2N + 1 N ρ j + 1 N θ j] has to contain 0 if θ j = 0 λ 2N + 1 N ρ j + 1 N θ j = 0 if θ j > 0 Therefore: Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
68 Coefficients path Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
69 Outline 1 Expected value 2 Linear regression wrap up 3 Bias, variance, noise Average function Expected loss 4 Ridge regression Keeping model in check (smart way) L2 penalty regularization L1 penalty lasso Gradient for L1 regularization Coordinace gradient descent 5 Choosing lambda Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
70 Optimize one coordinate at a time Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
71 Limitations Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
72 Coordinate descent for simple, ridge and lasso Algorithm until not converged select θ j (round robin, random, etc.) calculate ρ j : J(θ) θ j = 1 N N D i k j (θ k x (i) k y (i) ) x (i) j } {{ } ρ j +θ j 1 N update θ j : For simple: θ j = ρ j For ridge: θ j = ρ j 2λ+1 θ j = ρ j + λ if ρ 2 j > λ 2 For lasso: θ j = θ j = 0 if ρ j < λ 2 ; λ 2 > θ j = ρ j λ if ρ 2 j < λ 2 N [ (x (i) j ) 2] + 1 N (RegTerm) i } {{ } 1 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
73 Lasso sotf thresholding Θ j (simple) Θ j Θ j (LASSO) Θ j (ridge) λ 2 λ 2 ρ j Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
74 Lasso pros and cons No step size! Converges to optimum for strongly cnovex functions In some cases will not converge (the case with non differentiable cost functions) It shrinks coefficients relative to RSS it produces more bias, less variance Run lasso to select featutres Run simple/ridge with only selected features Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
75 Presentation Outline 1 Expected value 2 Linear regression wrap up 3 Bias, variance, noise 4 Ridge regression 5 Choosing lambda Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
76 Training/Test/Cross-Validation All data you have 60% 20% 20% Fit θ with respect to some λ Test different λ Test generalization error for θ Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
77 Cross validation techniques k-fold All data you have Test K Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
78 Cross validation techniques leave 1-out All data you have Test N Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
79 Cross validation techniques - (leave p-out) All data you have Test C n p Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
80 Cross validation techniques Monte-Carlo All data you have Test K Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
81 Cross validation techniques Bootstrap All data you have x 1 x 2 x 3 x 4 x 5 x 6 x 7 Test 1 x 1 x 1 x 2 x 5 x 3 x 4 x 6 x x 2 x 3 x 5 x 6 x 7 x 1 x 4... K x 3 x 6 x 6 x 7 x 3 x 2 x 4 x 1 x 5 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
82 Lasso and ridge vs. normalization Ridge and lasso puts penalty on the value of the θ Value of θ depends on the magnitude of gradient We multiply each gradient by, making θ dependant on x (i) j the magnitude of x (i) j The penalty is therefore dependant on the magnitude of x... :/ Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
83 Lasso and ridge vs. normalization Normalization: x j (i) = N x (i) j i (x (i) j ) 2 When testing and using models, you need to normalize it too: x j (test) = N x (test) j i (x (i) j ) 2 Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
84 Demo Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
85 Thank you! Szymon Bobek Institute of Applied Computer Science AGH University of Science and Technology 21 March Szymon Bobek (AGH-UST) Machine Learning 21 March / 62
CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationProbability and Statistical Decision Theory
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 3 Additive Models and Linear Regression Sinusoids and Radial Basis Functions Classification Logistic Regression Gradient Descent Polynomial Basis Functions
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationLINEAR REGRESSION, RIDGE, LASSO, SVR
LINEAR REGRESSION, RIDGE, LASSO, SVR Supervised Learning Katerina Tzompanaki Linear regression one feature* Price (y) What is the estimated price of a new house of area 30 m 2? 30 Area (x) *Also called
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationGaussians Linear Regression Bias-Variance Tradeoff
Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationLinear Regression 1 / 25. Karl Stratos. June 18, 2018
Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationIntroduction to the regression problem. Luca Martino
Introduction to the regression problem Luca Martino 2017 2018 1 / 30 Approximated outline of the course 1. Very basic introduction to regression 2. Gaussian Processes (GPs) and Relevant Vector Machines
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationCSE446: non-parametric methods Spring 2017
CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More informationFundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015
Fundamentals of Machine Learning Mohammad Emtiyaz Khan EPFL Aug 25, 25 Mohammad Emtiyaz Khan 24 Contents List of concepts 2 Course Goals 3 2 Regression 4 3 Model: Linear Regression 7 4 Cost Function: MSE
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationLECTURE NOTE #NEW 6 PROF. ALAN YUILLE
LECTURE NOTE #NEW 6 PROF. ALAN YUILLE 1. Introduction to Regression Now consider learning the conditional distribution p(y x). This is often easier than learning the likelihood function p(x y) and the
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationLECTURE 10: LINEAR MODEL SELECTION PT. 1. October 16, 2017 SDS 293: Machine Learning
LECTURE 10: LINEAR MODEL SELECTION PT. 1 October 16, 2017 SDS 293: Machine Learning Outline Model selection: alternatives to least-squares Subset selection - Best subset - Stepwise selection (forward and
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationMachine Learning and Data Mining. Linear regression. Prof. Alexander Ihler
+ Machine Learning and Data Mining Linear regression Prof. Alexander Ihler Supervised learning Notation Features x Targets y Predictions ŷ Parameters θ Learning algorithm Program ( Learner ) Change µ Improve
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationLinear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017
Linear Models: Comparing Variables Stony Brook University CSE545, Fall 2017 Statistical Preliminaries Random Variables Random Variables X: A mapping from Ω to ℝ that describes the question we care about
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationRidge Regression: Regulating overfitting when using many features. Training, true, & test error vs. model complexity. CSE 446: Machine Learning
Ridge Regression: Regulating overfitting when using many features Emily Fox University of Washington January 3, 207 Training, true, & test error vs. model complexity Overfitting if: Error y Model complexity
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationSupport Vector Machine I
Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationLinear discriminant functions
Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationAn Introduction to Statistical Machine Learning - Theoretical Aspects -
An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More information