Polynomial Regression and Regularization

Size: px
Start display at page:

Download "Polynomial Regression and Regularization"

Transcription

1 Polynomial Regression and Regularization

2 Administrivia o If you still haven t enrolled in Moodle yet Enrollment key on Piazza If joined course recently, me to get added to Piazza o Homework 1 posted later tonight. Good Milestones: Problems 1-3 This Week Problems 4 and 5 Next Week 2

3 The RoadMap o Last Time: Regression Refresher (there was nothing fresh about it) o This Time: Polynomial Regression Regularization (wiggles are bad, Man) o Next Time: Bias-Variance Trade-Off (what does it all MEAN?) 3

4 Previously on CSCI 4622 Given training data (x i1,x i2,...,x ip,y i ) for i =1, 2,...,n fit a regression of the form y i = x i1 + 2 x i2 + + p x ip + i Estimates of the parameters are found by minimizing RSS = Construct the design matrix X For the moment, solve via normal equations: where i N(0, 2 ) nx [( x i1 + + p x ip ) y i ] 2 = kx yk 2 i=1 by prepending column of 1s to data matrix ˆ = f X T X 1 X T y LOSS Function 4

5 So, We Can Model Things Like This 5

6 And Things Like This Y X 2 X 1 6

7 But What About Things Like This? 7

8 But What About Things Like This? o So yeah, nonlinearity is a thing o Clearly, the linear models of regression cannot handle this. o Give me Neural Networks or Give me Death! 8

9 But What About Things Like This? o So yeah, nonlinearity is a thing o Clearly, the linear models of regression cannot handle this. o Give me Neural Networks or Give me Death! o Nah. Regression can totally do this 9

10 Start with the Obvious: Polynomials o Suppose, as in the previous pictures, we have a single feature X o We want to go from this: Y = X + o To something like this: Y = X + 2 X p X p + o So what is the difference between these two? 10

11 Start with the Obvious: Polynomials o Suppose, as in the previous pictures, we have a single feature X o We want to go from this: Y = X + o To something like this: Y = X + 2 X p X p + o So what is the difference between these two? o Seems like it s closer to this: Y = X X p X p + 11

12 The One Becomes the Many o Start with a single feature X o And derive new polynomial features: X 1 = X, X 2 = X 2, X p = X p o These two things: Y = X + 2 X p X p + o are exactly the same: Y = X X p X p + 12

13 The Polynomial Regression Pipeline o Start with a single feature X o Derive new polynomial features: X 1 = X, X 2 = X 2, X p = X p o Solve the MLR in the usual way: Y = X X p X p + o Question: What does the Matrix Equation look like? 13

14 The Polynomial Regression Pipeline o Start with a single feature X o Derive new polynomial features: X 1 = X, X 2 = X 2, X p = X p o Solve the MLR in the usual way: Y = X X p X p + o Question: What does the Matrix Equation look like? Before: x 11 x 21 x 1p 1 x 21 x 22 x 2p 1 x 31 x 32 x 3p x n1 x n2 x np p 3 2 = y 1 y 2 y 3. y n

15 The Polynomial Regression Pipeline o Start with a single feature X o Derive new polynomial features: X 1 = X, X 2 = X 2, X p = X p o Solve the MLR in the usual way: Y = X X p X p + o Question: What does the Matrix Equation look like? After: x 1 x 2 1 x p 1 1 x 2 x 2 2 x p 2 1 x 3 x 2 2 x p x n x 2 n x p n p 3 2 = y 1 y 2 y 3. y n

16 The Polynomial Regression Pipeline o Suppose I learn the parameters from the training set o What if I want to make a prediction about data I haven t seen yet? HAVE TEST point Xo NEED to PERFORM SAME Transformation on Xo THAT WE PERFORMED data. xo ( 1,., On X5,...,XoP) THEN PRE Pichon IS just o= Xotp training 16

17 The Polynomial Regression Pipeline o Suppose I learn the parameters from the training set o What if I want to make a prediction about data I haven t seen yet? o Important Theme #1: If you transform your features for training, predicting is always as easy transforming the features of your new data in the same way. Input space - FEATURE SPACE Training X X, Xn,... XP Test Xo Xo MAGIC, Xp,.. Happens, XOP HERE! 17

18 The Polynomial Regression Pipeline o Suppose I learn the parameters from the training set o What if I want to make a prediction about data I haven t seen yet? o Important Theme #1: If you transform your features for training, predicting is always as easy transforming the features of your new data in the same way. o Important Theme #2: If your current features are not flexible enough, moving into a higher dimensional space will make your model more flexible (but BEWARE!) 18

19 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? 19

20 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? o Degree = 1 20

21 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? o Degree = 4 21

22 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? o Degree = 7 22

23 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? o Degree = 11 23

24 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? o Degree = 11 o What Questions should we be asking? 24

25 The Real ML Pipeline (Almost) The more flexible (powerful) your model gets, the better it ll do on training set But that s not useful. We want to know how model will do on new data How well it ll Generalize A better pipeline: RANDOM on TRAIN TEST on TH 'S THIS SPLIT ) ) L L All Labeled Data Training Set Validation Set Train on training set. Validate (or evaluate) on validation set. HELD DATA - OUT 25

26 The Real ML Pipeline (Almost) Need a performance measure: For Regression, we use the Mean-Squared Error: MSE = 1 n nx (y i ŷ i ) 2 I=1 - y TRUE RESPONSE predictions All Labeled Data Training Set Validation Set 26

27 Example: Sinusoidal Data What happens to the MSE on training set as p grows? What happens to MSE on validation set as p grows? MSE goes DOWNMSE goes Down At±T, then BACK-Up! 27

28 Example: Sinusoidal Data FLEXIBLE Enough MODEL / OUERFITT HAS STARTED ' ng 28

29 Example: Sinusoidal Data Revisit Polynomial Regression What causes those wiggles that are hurting us so bad? 29

30 Example: Sinusoidal Data True Model is f(x) =sin( x)+ 30

31 Example: Sinusoidal Data True Model is f(x) =sin( x)+ 31

32 Example: Sinusoidal Data True Model is f(x) =sin( x)+ 32

33 Example: Sinusoidal Data True Model is f(x) =sin( x)+ 33

34 Example: Sinusoidal Data Coefficients grow very large. Need a way to control them. Regulate them even 34

35 Regularization Goal: Keep model coefficients from growing too large Idea: Put something in the loss function to dissuade them growing too much minimize RSS = nx [( x i1 + + p x ip ) y i ] 2 + i=1 o Regularization weight of is a tuning parameter px k=1 2 k Tenafly o Have to do regularization study to choose best value 35

36 Regularization Goal: Keep model coefficients from growing too large Idea: Put something in the loss function to dissuade them growing too much RSS = nx [( x i1 + + p x ip ) y i ] 2 + i=1 px k=1 2 k o Why do we not regularize the bias parameter? BIAS Tells Us About Average RESPONSE ( HEIGHT OF DATA ). Regularizing po pulls THE whole model down FROM WHERE Should BE 36

37 Ridge Regression Goal: Keep model coefficients from growing too large Idea: Put something in the loss function to dissuade them growing too much RSS = nx [( x i1 + + p x ip ) y i ] 2 + i=1 px k=1 2 k o This form of penalty term is called Ridge Regression. But there are many more 37

38 Sinusoidal Data with Ridge Regression Watch the coefficients 38

39 Sinusoidal Data with Ridge Regression Watch the coefficients 39

40 Sinusoidal Data with Ridge Regression Watch the coefficients 40

41 Sinusoidal Data with Ridge Regression Watch the coefficients 41

42 Sinusoidal Data with Ridge Regression Watch the coefficients 42

43 Sinusoidal Data with Ridge Regression Watch the coefficients NOTE : COEFFICIENTS STAYED Small! 43

44 Example: Sinusoidal Data No Regularization Ridge Regression BLOW - UP HAPPENS MUCH LATER! 44

45 Example: Sinusoidal Data Regularization Study: o Choose poly degree o Vary over reg strength o Look for best validation MSE 45

46 Example: Sinusoidal Data Regularization Study: o Choose poly degree o Vary over reg strength o Look at size of coefficients COEFFS As SHRINK oo 46

47 Regularization Wrap-Up You should always do regularization. If you choose carefully, it will always help Generalization Next Time: o Talk more about Ridge Regression Details o Learning Curves and what they tell us about the Bias-Variance Trade-Off 47

48 48

49 49

50 50

51 51

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Multiple Linear Regression for the Supervisor Data

Multiple Linear Regression for the Supervisor Data for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Ridge Regression: Regulating overfitting when using many features

Ridge Regression: Regulating overfitting when using many features Ridge Regression: Regulating overfitting when using man features Emil Fox Universit of Washington April 10, 2018 Training, true, & test error vs. model complexit Overfitting if: Error Model complexit 2

More information

CS395T: Structured Models for NLP Lecture 2: Machine Learning Review

CS395T: Structured Models for NLP Lecture 2: Machine Learning Review CS395T: Structured Models for NLP Lecture 2: Machine Learning Review Greg Durrett Some slides adapted from Vivek Srikumar, University of Utah Administrivia Course enrollment Lecture slides posted on website

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Ridge Regression: Regulating overfitting when using many features. Training, true, & test error vs. model complexity. CSE 446: Machine Learning

Ridge Regression: Regulating overfitting when using many features. Training, true, & test error vs. model complexity. CSE 446: Machine Learning Ridge Regression: Regulating overfitting when using many features Emily Fox University of Washington January 3, 207 Training, true, & test error vs. model complexity Overfitting if: Error y Model complexity

More information

Lecture 5: A step back

Lecture 5: A step back Lecture 5: A step back Last time Last time we talked about a practical application of the shrinkage idea, introducing James-Stein estimation and its extension We saw our first connection between shrinkage

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

Lecture 3: Just a little more math

Lecture 3: Just a little more math Lecture 3: Just a little more math Last time Through simple algebra and some facts about sums of normal random variables, we derived some basic results about orthogonal regression We used as our major

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign

More information

Math 651 Introduction to Numerical Analysis I Fall SOLUTIONS: Homework Set 1

Math 651 Introduction to Numerical Analysis I Fall SOLUTIONS: Homework Set 1 ath 651 Introduction to Numerical Analysis I Fall 2010 SOLUTIONS: Homework Set 1 1. Consider the polynomial f(x) = x 2 x 2. (a) Find P 1 (x), P 2 (x) and P 3 (x) for f(x) about x 0 = 0. What is the relation

More information

5 DAY YOUR BUSINESS CHALLENGE WORKBOOK

5 DAY YOUR BUSINESS CHALLENGE WORKBOOK 5 DAY YOUR BUSINESS CHALLENGE WORKBOOK Fire Up Your Business So you ve started a business and are passionate about cultivating this gig into your main income to create a life of freedom, flexibility and

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Machine Learning and Data Mining. Linear regression. Prof. Alexander Ihler

Machine Learning and Data Mining. Linear regression. Prof. Alexander Ihler + Machine Learning and Data Mining Linear regression Prof. Alexander Ihler Supervised learning Notation Features x Targets y Predictions ŷ Parameters θ Learning algorithm Program ( Learner ) Change µ Improve

More information

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what

More information

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016 CPSC 340: Machine Learning and Data Mining Gradient Descent Fall 2016 Admin Assignment 1: Marks up this weekend on UBC Connect. Assignment 2: 3 late days to hand it in Monday. Assignment 3: Due Wednesday

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Gravity and Orbits. 1. Choose the picture you think shows the gravity forces on the Earth and the Sun.

Gravity and Orbits. 1. Choose the picture you think shows the gravity forces on the Earth and the Sun. Name: Grade: Gravity and Orbits Pre-lab 1. Choose the picture you think shows the gravity forces on the Earth and the Sun. (a longer arrow to represents a big force, and a shorter arrow represent a smaller

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

Gravity and Orbits Activity Page 1. Name: Grade: Gravity and Orbits. Pre-lab. 1. In the picture below, draw how you think Earth moves.

Gravity and Orbits Activity Page 1. Name: Grade: Gravity and Orbits. Pre-lab. 1. In the picture below, draw how you think Earth moves. Name: Grade: Gravity and Orbits Pre-lab 1. In the picture below, draw how you think Earth moves. 2. Draw a picture using arrows to show what you think the forces might be on the Earth and the Sun. You

More information

Linear classifiers Lecture 3

Linear classifiers Lecture 3 Linear classifiers Lecture 3 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin ML Methodology Data: labeled instances, e.g. emails marked spam/ham

More information

Probability Distributions

Probability Distributions CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas

More information

Sequences and infinite series

Sequences and infinite series Sequences and infinite series D. DeTurck University of Pennsylvania March 29, 208 D. DeTurck Math 04 002 208A: Sequence and series / 54 Sequences The lists of numbers you generate using a numerical method

More information

Reinforcement Learning Wrap-up

Reinforcement Learning Wrap-up Reinforcement Learning Wrap-up Slides courtesy of Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

LIMITS AT INFINITY MR. VELAZQUEZ AP CALCULUS

LIMITS AT INFINITY MR. VELAZQUEZ AP CALCULUS LIMITS AT INFINITY MR. VELAZQUEZ AP CALCULUS RECALL: VERTICAL ASYMPTOTES Remember that for a rational function, vertical asymptotes occur at values of x = a which have infinite its (either positive or

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

Terminology for Statistical Data

Terminology for Statistical Data Terminology for Statistical Data variables - features - attributes observations - cases (consist of multiple values) In a standard data matrix, variables or features correspond to columns observations

More information

CSC 411: Lecture 02: Linear Regression

CSC 411: Lecture 02: Linear Regression CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Regression Analysis IV... More MLR and Model Building

Regression Analysis IV... More MLR and Model Building Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression

More information

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari MS&E 226: Small Data Lecture 6: Bias and variance (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 47 Our plan today We saw in last lecture that model scoring methods seem to be trading o two di erent

More information

COS 424: Interacting with Data

COS 424: Interacting with Data COS 424: Interacting with Data Lecturer: Rob Schapire Lecture #14 Scribe: Zia Khan April 3, 2007 Recall from previous lecture that in regression we are trying to predict a real value given our data. Specically,

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 02: Linear Regression 1 I d rather die than telling you my password! Transfer success! 2 Outline Announcements Linear Regression Regularization Cross-validation 3 Outline

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

MATH 320, WEEK 11: Eigenvalues and Eigenvectors

MATH 320, WEEK 11: Eigenvalues and Eigenvectors MATH 30, WEEK : Eigenvalues and Eigenvectors Eigenvalues and Eigenvectors We have learned about several vector spaces which naturally arise from matrix operations In particular, we have learned about the

More information

Machine Learning & Data Mining CMS/CS/CNS/EE 155. Lecture 2: Perceptron & Gradient Descent

Machine Learning & Data Mining CMS/CS/CNS/EE 155. Lecture 2: Perceptron & Gradient Descent Machine Learning & Data Mining CMS/CS/CNS/EE 155 Lecture 2: Perceptron & Gradient Descent Announcements Homework 1 is out Due Tuesday Jan 12 th at 2pm Via Moodle Sign up for Moodle & Piazza if you haven

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

Sample questions for Fundamentals of Machine Learning 2018

Sample questions for Fundamentals of Machine Learning 2018 Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Partial Fractions. June 27, In this section, we will learn to integrate another class of functions: the rational functions.

Partial Fractions. June 27, In this section, we will learn to integrate another class of functions: the rational functions. Partial Fractions June 7, 04 In this section, we will learn to integrate another class of functions: the rational functions. Definition. A rational function is a fraction of two polynomials. For example,

More information

CSC 411 Lecture 6: Linear Regression

CSC 411 Lecture 6: Linear Regression CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression

More information

Advanced Placement Physics C Summer Assignment

Advanced Placement Physics C Summer Assignment Advanced Placement Physics C Summer Assignment Summer Assignment Checklist: 1. Book Problems. Selected problems from Fundamentals of Physics. (Due August 31 st ). Intro to Calculus Packet. (Attached) (Due

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

STATISTICS 1 REVISION NOTES

STATISTICS 1 REVISION NOTES STATISTICS 1 REVISION NOTES Statistical Model Representing and summarising Sample Data Key words: Quantitative Data This is data in NUMERICAL FORM such as shoe size, height etc. Qualitative Data This is

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

...but you just watched a huge amount of science taking place! Let s look a little closer at what just happened!

...but you just watched a huge amount of science taking place! Let s look a little closer at what just happened! Chapter 2: Page 10 Chapter 2: Page 11 Almost everyone gets into a car every day. Let s think about some of the steps it takes to drive a car. The first thing you should do, of course, is put on your seat

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

High-Dimensional Statistical Learning: Introduction

High-Dimensional Statistical Learning: Introduction Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/

More information

Math 1553 Introduction to Linear Algebra. School of Mathematics Georgia Institute of Technology

Math 1553 Introduction to Linear Algebra. School of Mathematics Georgia Institute of Technology Math 1553 Introduction to Linear Algebra School of Mathematics Georgia Institute of Technology Resources There are links from T-Square. Everything is on the course web page. Master Webpage Section J How

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Lecture 10: F -Tests, ANOVA and R 2

Lecture 10: F -Tests, ANOVA and R 2 Lecture 10: F -Tests, ANOVA and R 2 1 ANOVA We saw that we could test the null hypothesis that β 1 0 using the statistic ( β 1 0)/ŝe. (Although I also mentioned that confidence intervals are generally

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining Regularization Fall 2017 Assignment 2 Admin 2 late days to hand in tonight, answers posted tomorrow morning. Extra office hours Thursday at 4pm (ICICS 246). Midterm

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Tufts COMP 135: Introduction to Machine Learning

Tufts COMP 135: Introduction to Machine Learning Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)

More information

Bias-Variance Trade-off in ML. Sargur Srihari

Bias-Variance Trade-off in ML. Sargur Srihari Bias-Variance Trade-off in ML Sargur srihari@cedar.buffalo.edu 1 Bias-Variance Decomposition 1. Model Complexity in Linear Regression 2. Point estimate Bias-Variance in Statistics 3. Bias-Variance in Regression

More information

Low Bias Bagged Support Vector Machines

Low Bias Bagged Support Vector Machines Low Bias Bagged Support Vector Machines Giorgio Valentini Dipartimento di Scienze dell Informazione Università degli Studi di Milano, Italy valentini@dsi.unimi.it Thomas G. Dietterich Department of Computer

More information

Fundamentals of Machine Learning

Fundamentals of Machine Learning Fundamentals of Machine Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://icapeople.epfl.ch/mekhan/ emtiyaz@gmail.com Jan 2, 27 Mohammad Emtiyaz Khan 27 Goals Understand (some) fundamentals of Machine

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

University of Maryland Department of Physics

University of Maryland Department of Physics Spring 3 University of Maryland Department of Physics Laura Lising Physics 1 March 6, 3 Exam #1 nswer all questions on these sheets. Please write clearly and neatly: We can only give you credit for what

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

ESS2222. Lecture 3 Bias-Variance Trade-off

ESS2222. Lecture 3 Bias-Variance Trade-off ESS2222 Lecture 3 Bias-Variance Trade-off Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Bias-Variance Trade-off Overfitting & Regularization Ridge & Lasso Regression Nonlinear

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation? Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation

More information

Learning theory Lecture 4

Learning theory Lecture 4 Learning theory Lecture 4 David Sontag New York University Slides adapted from Carlos Guestrin & Luke Zettlemoyer What s next We gave several machine learning algorithms: Perceptron Linear support vector

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Calculus: What is a Limit? (understanding epislon-delta proofs)

Calculus: What is a Limit? (understanding epislon-delta proofs) Calculus: What is a Limit? (understanding epislon-delta proofs) Here is the definition of a limit: Suppose f is a function. We say that Lim aa ff() = LL if for every εε > 0 there is a δδ > 0 so that if

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

CSC321 Lecture 4 The Perceptron Algorithm

CSC321 Lecture 4 The Perceptron Algorithm CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:

More information

MA 1125 Lecture 33 - The Sign Test. Monday, December 4, Objectives: Introduce an example of a non-parametric test.

MA 1125 Lecture 33 - The Sign Test. Monday, December 4, Objectives: Introduce an example of a non-parametric test. MA 1125 Lecture 33 - The Sign Test Monday, December 4, 2017 Objectives: Introduce an example of a non-parametric test. For the last topic of the semester we ll look at an example of a non-parametric test.

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information