Learning From Data Lecture 12 Regularization
|
|
- Gillian Aubrie Bradford
- 5 years ago
- Views:
Transcription
1 Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented Error M. Magdon-Ismail CSCI 4100/6100
2 recap: Overfitting Fitting the data more than is warranted Data Target Fit y c AM L Creator: Malik Magdon-Ismail Regularization: 2 /30 Noise
3 recap: Noise is Part of y We Cannot Model Stochastic Noise Deterministic Noise f() h y y y = f()+stoch. noise y = h ()+det. noise Stochastic and Deterministic Noise Hurt Learning Human: Good at etracting the simple pattern, ignoring the noise and complications. Computer: Pays equal attention to all piels. Needs help simplifying (features, regularization). c AM L Creator: Malik Magdon-Ismail Regularization: 3 /30 What is regularization?
4 Regularization What is regularization? A cure for our tendency to fit (get distracted by) the noise, hence improving E out. How does it work? By constraining the model so that we cannot fit the noise. putting on the brakes Side effects? Themedicationwillhavesideeffects ifwecannotfitthenoise,maybewecannotfitf (thesignal)? c AM L Creator: Malik Magdon-Ismail Regularization: 4 /30 Constraining
5 Constraining the Model: Does it Help? y...and the winner is: c AM L Creator: Malik Magdon-Ismail Regularization: 5 /30 Small weights
6 Constraining the Model: Does it Help? y y constrain weights to be smaller...and the winner is: c AM L Creator: Malik Magdon-Ismail Regularization: 6 /30 bias
7 Bias Goes Up A Little yḡ() yḡ() sin() sin() no regularization bias = 0.21 regularization bias = 0.23 side effect (Constant model had bias=0.5 and var=0.25.) c AM L Creator: Malik Magdon-Ismail Regularization: 7 /30 var
8 Variance Drop is Dramatic! yḡ() yḡ() sin() sin() no regularization bias = 0.21 var = 1.69 regularization bias = 0.23 var = 0.33 side effect treatment (Constant model had bias=0.5 and var=0.25.) c AM L Creator: Malik Magdon-Ismail Regularization: 8 /30 Regularication in a nutshell
9 Regularization in a Nutshell VC analysis: E out (g) E in (g)+ω(h) տ If you use a simpler H and get a good fit, then your E out is better. Regularization takes this a step further: If you use a simpler h and get a good fit, then is your E out better? c AM L Creator: Malik Magdon-Ismail Regularization: 9 /30 Polynomials
10 Polynomials of Order Q - A Useful Testbed H q : polynomials of order Q. Standard Polynomial 1 z = 2. q Legendre Polynomial 1 h() = w t z() L 1 () z = = w 0 +w 1 + +w q q L 2 (). L q () h() = w t z() ւ we re using linear regression = w 0 +w 1 L 1 ()+ +w q L q () տ allows us to treat the weights independently L 1 L 2 L 3 L 4 L (32 1) 1 2 (53 3) 1 8 ( ) 1 8 (635 ) c AM L Creator: Malik Magdon-Ismail Regularization: 10 /30 recap: linear regression
11 recap: Linear Regression ( 1,y 1 ),...,( N,y N ) } {{ } X y (z 1,y 1 ),...,(z N,y N ) } {{ } Z y min : E in (w) = 1 N N (w t z n y n ) 2 n=1 = 1 N (Zw y)t (Zw y) w lin = (Z t Z) 1 Z t y linear regression fit ր c AM L Creator: Malik Magdon-Ismail Regularization: 11 /30 Already saw constraints
12 Constraining The Model: H 10 vs. H 2 H 10 = { h() = w 0 +w 1 Φ 1 ()+w 2 Φ 2 ()+w 3 Φ 3 ()+ +w 10 Φ 10 () } H 2 = { h() = w0 +w 1 Φ 1 ()+w 2 Φ 2 ()+w 3 Φ 3 ()+ +w 10 Φ 10 () such that: w 3 = w 4 = = w 10 = 0 ր a hard order constraint that sets some weights to zero } H 2 H 10 c AM L Creator: Malik Magdon-Ismail Regularization: 12 /30 Soft constraint
13 Soft Order Constraint Don t set weights eplicitly to zero (e.g. w 3 = 0). Give a budget and let the learning choose. H 10 q wq 2 C q=0 տ budget for weights C soft order constraint allows intermediate models H 2 c AM L Creator: Malik Magdon-Ismail Regularization: 13 /30 H C
14 Soft Order Constrained Model H C H 10 = { h() = w 0 +w 1 Φ 1 ()+w 2 Φ 2 ()+w 3 Φ 3 ()+ +w 10 Φ 10 () } H 2 = { h() = w0 +w 1 Φ 1 ()+w 2 Φ 2 ()+w 3 Φ 3 ()+ +w 10 Φ 10 () such that: w 3 = w 4 = = w 10 = 0 } H C = h() = w 0 +w 1 Φ 1 ()+w 2 Φ 2 ()+w 3 Φ 3 ()+ +w 10 Φ 10 () such that: 10 q=0 w 2 q C ր a soft budget constraint on the sum of weights VC-perspective: H C is smaller than H 10 = better generalization. c AM L Creator: Malik Magdon-Ismail Regularization: 14 /30 Fitting data
15 Fitting the Data The optimal weights ր regularized w reg H C should minimize the in-sample error, but be within the budget. w reg is a solution to min : E in (w) = 1 N (Zw y)t (Zw y) subject to: w t w C c AM L Creator: Malik Magdon-Ismail Regularization: 15 /30 Getting w reg
16 Solving For w reg min : E in (w) = 1 N (Zw y)t (Zw y) subject to: w t w C E in = const. Observations: 1. Optimal w tries to get as close to w lin as possible. Optimal w will use full budget and be on the surface w t w = C. 2. Surface w t w = C, at optimal w, should be perpindicular to E in. Otherwise can move along the surface and decrease E in. 3. Normal to surface w t w = C is the vector w. E in w lin w normal 4. Surface is E in ; surface is normal. E in is parallel to normal (but in opposite direction). w t w = C E in (w reg ) = 2λ C w reg ր λ C, the lagrange multiplier, is positive. The 2 is for mathematical convenience. c AM L Creator: Malik Magdon-Ismail Regularization: 16 /30 Unconstrained minimization
17 Solving For w reg E in (w) is minimized, subject to: w t w C E in (w reg )+2λ C w reg = 0 (E in (w)+λ C w t w) w=wreg = 0 E in (w)+λ C w t w is minimized, unconditionally There is a correspondence: C λ C c AM L Creator: Malik Magdon-Ismail Regularization: 17 /30 Augmented error
18 The Augmented Error Pick a C and minimize E in (w) subject to: w t w C Pick a λ C and minimize E aug (w) = E in (w)+λ C w t w unconditionally տ A penalty for the compleity of h, measured by the size of the weights. We can pick any budget C. Translation: we are free to pick any multiplier λ C What s the right C? What s the right λ C? c AM L Creator: Malik Magdon-Ismail Regularization: 18 /30 Linear regression
19 Linear Regression With Soft Order Constraint E aug (w) = 1 N (Zw y)t (Zw y)+λ C w t w տ Convenient to set λ C = λ N E aug (w) = (Zw y)t (Zw y)+λw t w N տ called weight decay as the penalty encourages smaller weights Unconditionally minimize E aug (w). c AM L Creator: Malik Magdon-Ismail Regularization: 19 /30 Linear regression solution
20 The Solution for w reg E aug (w) = 2Z t (Zw y)+2λw = 2(Z t Z+λI)w 2Z t y Set E aug (w) = 0 w reg = (Z t Z+λI) 1 Z t y λ determines the amount of regularization Recall the unconstrained solution (λ = 0): w lin = (Z t Z) 1 Z t y c AM L Creator: Malik Magdon-Ismail Regularization: 20 /30 Dramatic effect
21 A Little Regularization... Minimizing E in (w)+ λ N wt w with different λ s λ = 0 λ = Data Target Fit y Overfitting Wow! c AM L Creator: Malik Magdon-Ismail Regularization: 21 /30 Just a little works
22 ...Goes A Long Way Minimizing E in (w)+ λ N wt w with different λ s λ = 0 λ = Data Target Fit y y Overfitting Wow! c AM L Creator: Malik Magdon-Ismail Regularization: 22 /30 Easy to overdose
23 Don t Overdose Minimizing E in (w)+ λ N wt w with different λ s λ = 0 λ = λ = 0.01 λ = 1 Data Target Fit y y y y Overfitting Underfitting c AM L Creator: Malik Magdon-Ismail Regularization: 23 /30 Overfitting and underfitting
24 Overfitting and Underfitting PSfrag 0.84 overfitting underfitting Epected Eout Regularization Parameter, λ c AM L Creator: Malik Magdon-Ismail Regularization: 24 /30 Noise and regularization
25 More Noise Needs More Medicine 1 Epected Eout σ 2 = 0.5 σ 2 = 0.25 σ 2 = Regularization Parameter, λ c AM L Creator: Malik Magdon-Ismail Regularization: 25 /30 Deterministic too
26 ...Even For Deterministic Noise Epected Eout σ 2 = 0.5 σ 2 = 0.25 σ 2 = 0 Epected Eout Q f = 100 Q f = 30 Q f = Regularization Parameter, λ Regularization Parameter, λ c AM L Creator: Malik Magdon-Ismail Regularization: 26 /30 Variations on weight decay
27 Variations on Weight Decay Uniform Weight Decay Low Order Fit Weight Growth! 0.84 overfitting underfitting 0.84 weight growth Epected Eout 0.8 Epected Eout 0.8 Epected Eout weight decay Regularization Parameter, λ Regularization Parameter, λ Regularization Parameter, λ Q q=0 w 2 q Q q=0 qw 2 q Q q=0 1 w 2 q c AM L Creator: Malik Magdon-Ismail Regularization: 27 /30 Choosing a regularizer
28 Choosing a Regularizer A Practitioner s Guide The perfect regularizer: constrain in the direction of the target function. target function is unknown (going around in circles ). The guiding principle: constrain in the direction of smoother (usually simpler) hypotheses hurts your ability to fit the high frequency noise usually means smoother and simpler weight decay not weight growth. What if you choose the wrong regularizer? You still have λ to play with validation. c AM L Creator: Malik Magdon-Ismail Regularization: 28 /30 Regularization philosophy
29 How Does Regularization Work? Stochastic noise nothing you can do about that. Good features helps to reduce deterministic noise. Regularization: Helps to combat what noise remains, especially when N is small. Typical modus operandi: sacrifice a little bias for a huge improvement in var. VC angle: you are using a smaller H without sacrificing too much E in c AM L Creator: Malik Magdon-Ismail Regularization: 29 /30 E aug versus E in
30 Augmented Error as a Proy for E out E aug (h) = E in (h)+ λ N Ω(h) ւ this was w t w E out (h) E in (h)+ω(h) տ ( ) dvc this was O N lnn E aug can beat E in as a proy for E out. տ depends on choice of λ c AM L Creator: Malik Magdon-Ismail Regularization: 30 /30
Learning From Data Lecture 13 Validation and Model Selection
Learning From Data Lecture 13 Validation and Model Selection The Validation Set Model Selection Cross Validation M. Magdon-Ismail CSCI 4100/6100 recap: Regularization Regularization combats the effects
More informationLearning From Data Lecture 25 The Kernel Trick
Learning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel M. Magdon-Ismail CSCI 400/600 recap: Large Margin is Better Controling Overfitting Non-Separable Data 0.08 random
More informationLearning From Data Lecture 14 Three Learning Principles
Learning From Data Lecture 14 Three Learning Principles Occam s Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100 recap: Validation and Cross Validation Validation Cross Validation D (N)
More informationLearning From Data Lecture 7 Approximation Versus Generalization
Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100 recap: The Vapnik-Chervonenkis
More informationLearning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I
Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2
More informationLearning From Data Lecture 8 Linear Classification and Regression
Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear Regression M. Magdon-Ismail CSCI 4100/6100 recap: Approximation Versus Generalization VC Analysis E out E
More informationLearning From Data Lecture 10 Nonlinear Transforms
Learning From Data Lecture 0 Nonlinear Transforms The Z-space Polynomial transforms Be careful M. Magdon-Ismail CSCI 400/600 recap: The Linear Model linear in w: makes the algorithms work linear in x:
More informationLearning From Data Lecture 9 Logistic Regression and Gradient Descent
Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression Gradient Descent M. Magdon-Ismail CSCI 4100/6100 recap: Linear Classification and Regression The linear signal:
More informationLearning From Data Lecture 5 Training Versus Testing
Learning From Data Lecture 5 Training Versus Testing The Two Questions of Learning Theory of Generalization (E in E out ) An Effective Number of Hypotheses A Combinatorial Puzzle M. Magdon-Ismail CSCI
More informationLearning From Data Lecture 2 The Perceptron
Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Plan 1.
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationRidge Regression 1. to which some random noise is added. So that the training labels can be represented as:
CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationMachine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 13: Hazard of Overfitting Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan Universit
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationLearning From Data Lecture 26 Kernel Machines
Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M Magdon-Ismail CSCI 4100/6100 recap: The Kernel Allows Us to Bypass Z-space
More informationFundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015
Fundamentals of Machine Learning Mohammad Emtiyaz Khan EPFL Aug 25, 25 Mohammad Emtiyaz Khan 24 Contents List of concepts 2 Course Goals 3 2 Regression 4 3 Model: Linear Regression 7 4 Cost Function: MSE
More informationORIE 4741: Learning with Big Messy Data. Generalization
ORIE 4741: Learning with Big Messy Data Generalization Professor Udell Operations Research and Information Engineering Cornell September 23, 2017 1 / 21 Announcements midterm 10/5 makeup exam 10/2, by
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More information12 Statistical Justifications; the Bias-Variance Decomposition
Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationLinear models and the perceptron algorithm
8/5/6 Preliminaries Linear models and the perceptron algorithm Chapters, 3 Definition: The Euclidean dot product beteen to vectors is the expression dx T x = i x i The dot product is also referred to as
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationPreliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1
90 8 80 7 70 6 60 0 8/7/ Preliminaries Preliminaries Linear models and the perceptron algorithm Chapters, T x + b < 0 T x + b > 0 Definition: The Euclidean dot product beteen to vectors is the expression
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More informationOn prediction. Jussi Hakanen Post-doctoral researcher. TIES445 Data mining (guest lecture)
On prediction Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Learning outcomes To understand the basic principles of prediction To understand linear regression in prediction To be aware of
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationLecture 2: Linear regression
Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More informationLearning From Data Lecture 3 Is Learning Feasible?
Learning From Data Lecture 3 Is Learning Feasible? Outside the Data Probability to the Rescue Learning vs. Verification Selection Bias - A Cartoon M. Magdon-Ismail CSCI 4100/6100 recap: The Perceptron
More informationCPSC 340: Machine Learning and Data Mining. Regularization Fall 2017
CPSC 340: Machine Learning and Data Mining Regularization Fall 2017 Assignment 2 Admin 2 late days to hand in tonight, answers posted tomorrow morning. Extra office hours Thursday at 4pm (ICICS 246). Midterm
More informationCS168: The Modern Algorithmic Toolbox Lecture #6: Regularization
CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationClass 7. Prediction, Transformation and Multiple Regression.
Class 7. Prediction, Transformation and Multiple Regression. 1 Today s material Prediction Transformation Multiple regression Robust regression Bootstrap 2 Prediction Two types corresponding to the data
More informationCSC 411 Lecture 6: Linear Regression
CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationLearning from Data: Regression
November 3, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Classification or Regression? Classification: want to learn a discrete target variable. Regression: want to learn a continuous target variable. Linear
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationFundamentals of Machine Learning
Fundamentals of Machine Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://icapeople.epfl.ch/mekhan/ emtiyaz@gmail.com Jan 2, 27 Mohammad Emtiyaz Khan 27 Goals Understand (some) fundamentals of Machine
More informationIntroduction to Machine Learning (67577) Lecture 7
Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationNeural Networks: Optimization & Regularization
Neural Networks: Optimization & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) NN Opt & Reg
More informationLecture 9: Generalization
Lecture 9: Generalization Roger Grosse 1 Introduction When we train a machine learning model, we don t just want it to learn to model the training data. We want it to generalize to data it hasn t seen
More informationWeighted Majority and the Online Learning Approach
Statistical Techniques in Robotics (16-81, F12) Lecture#9 (Wednesday September 26) Weighted Majority and the Online Learning Approach Lecturer: Drew Bagnell Scribe:Narek Melik-Barkhudarov 1 Figure 1: Drew
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationCSE446: non-parametric methods Spring 2017
CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationMultivariate Time Series: Part 4
Multivariate Time Series: Part 4 Cointegration Gerald P. Dwyer Clemson University March 2016 Outline 1 Multivariate Time Series: Part 4 Cointegration Engle-Granger Test for Cointegration Johansen Test
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 3 Additive Models and Linear Regression Sinusoids and Radial Basis Functions Classification Logistic Regression Gradient Descent Polynomial Basis Functions
More informationTopic 4 Unit Roots. Gerald P. Dwyer. February Clemson University
Topic 4 Unit Roots Gerald P. Dwyer Clemson University February 2016 Outline 1 Unit Roots Introduction Trend and Difference Stationary Autocorrelations of Series That Have Deterministic or Stochastic Trends
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationOn Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong
On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality Weiqiang Dong 1 The goal of the work presented here is to illustrate that classification error responds to error in the target probability estimates
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationRirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology
Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More information