Polynomial Regression and Regularization
|
|
- Beatrix Wright
- 5 years ago
- Views:
Transcription
1 Polynomial Regression and Regularization
2 Administrivia o If you still haven t enrolled in Moodle yet Enrollment key on Piazza If joined course recently, me to get added to Piazza o Homework 1 posted later tonight. Good Milestones: Problems 1-3 This Week Problems 4 and 5 Next Week 2
3 The RoadMap o Last Time: Regression Refresher (there was nothing fresh about it) o This Time: Polynomial Regression Regularization (wiggles are bad, Man) o Next Time: Bias-Variance Trade-Off (what does it all MEAN?) 3
4 Previously on CSCI 4622 Given training data (x i1,x i2,...,x ip,y i ) for i =1, 2,...,n fit a regression of the form y i = x i1 + 2 x i2 + + p x ip + i Estimates of the parameters are found by minimizing RSS = Construct the design matrix X For the moment, solve via normal equations: where i N(0, 2 ) nx [( x i1 + + p x ip ) y i ] 2 = kx yk 2 i=1 by prepending column of 1s to data matrix ˆ = f X T X 1 X T y LOSS Function 4
5 So, We Can Model Things Like This 5
6 And Things Like This Y X 2 X 1 6
7 But What About Things Like This? 7
8 But What About Things Like This? o So yeah, nonlinearity is a thing o Clearly, the linear models of regression cannot handle this. o Give me Neural Networks or Give me Death! 8
9 But What About Things Like This? o So yeah, nonlinearity is a thing o Clearly, the linear models of regression cannot handle this. o Give me Neural Networks or Give me Death! o Nah. Regression can totally do this 9
10 Start with the Obvious: Polynomials o Suppose, as in the previous pictures, we have a single feature X o We want to go from this: Y = X + o To something like this: Y = X + 2 X p X p + o So what is the difference between these two? 10
11 Start with the Obvious: Polynomials o Suppose, as in the previous pictures, we have a single feature X o We want to go from this: Y = X + o To something like this: Y = X + 2 X p X p + o So what is the difference between these two? o Seems like it s closer to this: Y = X X p X p + 11
12 The One Becomes the Many o Start with a single feature X o And derive new polynomial features: X 1 = X, X 2 = X 2, X p = X p o These two things: Y = X + 2 X p X p + o are exactly the same: Y = X X p X p + 12
13 The Polynomial Regression Pipeline o Start with a single feature X o Derive new polynomial features: X 1 = X, X 2 = X 2, X p = X p o Solve the MLR in the usual way: Y = X X p X p + o Question: What does the Matrix Equation look like? 13
14 The Polynomial Regression Pipeline o Start with a single feature X o Derive new polynomial features: X 1 = X, X 2 = X 2, X p = X p o Solve the MLR in the usual way: Y = X X p X p + o Question: What does the Matrix Equation look like? Before: x 11 x 21 x 1p 1 x 21 x 22 x 2p 1 x 31 x 32 x 3p x n1 x n2 x np p 3 2 = y 1 y 2 y 3. y n
15 The Polynomial Regression Pipeline o Start with a single feature X o Derive new polynomial features: X 1 = X, X 2 = X 2, X p = X p o Solve the MLR in the usual way: Y = X X p X p + o Question: What does the Matrix Equation look like? After: x 1 x 2 1 x p 1 1 x 2 x 2 2 x p 2 1 x 3 x 2 2 x p x n x 2 n x p n p 3 2 = y 1 y 2 y 3. y n
16 The Polynomial Regression Pipeline o Suppose I learn the parameters from the training set o What if I want to make a prediction about data I haven t seen yet? HAVE TEST point Xo NEED to PERFORM SAME Transformation on Xo THAT WE PERFORMED data. xo ( 1,., On X5,...,XoP) THEN PRE Pichon IS just o= Xotp training 16
17 The Polynomial Regression Pipeline o Suppose I learn the parameters from the training set o What if I want to make a prediction about data I haven t seen yet? o Important Theme #1: If you transform your features for training, predicting is always as easy transforming the features of your new data in the same way. Input space - FEATURE SPACE Training X X, Xn,... XP Test Xo Xo MAGIC, Xp,.. Happens, XOP HERE! 17
18 The Polynomial Regression Pipeline o Suppose I learn the parameters from the training set o What if I want to make a prediction about data I haven t seen yet? o Important Theme #1: If you transform your features for training, predicting is always as easy transforming the features of your new data in the same way. o Important Theme #2: If your current features are not flexible enough, moving into a higher dimensional space will make your model more flexible (but BEWARE!) 18
19 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? 19
20 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? o Degree = 1 20
21 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? o Degree = 4 21
22 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? o Degree = 7 22
23 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? o Degree = 11 23
24 Example: Sinusoidal Data True Model is f(x) =sin( x)+ o Question: What degree polynomial should I use? o Degree = 11 o What Questions should we be asking? 24
25 The Real ML Pipeline (Almost) The more flexible (powerful) your model gets, the better it ll do on training set But that s not useful. We want to know how model will do on new data How well it ll Generalize A better pipeline: RANDOM on TRAIN TEST on TH 'S THIS SPLIT ) ) L L All Labeled Data Training Set Validation Set Train on training set. Validate (or evaluate) on validation set. HELD DATA - OUT 25
26 The Real ML Pipeline (Almost) Need a performance measure: For Regression, we use the Mean-Squared Error: MSE = 1 n nx (y i ŷ i ) 2 I=1 - y TRUE RESPONSE predictions All Labeled Data Training Set Validation Set 26
27 Example: Sinusoidal Data What happens to the MSE on training set as p grows? What happens to MSE on validation set as p grows? MSE goes DOWNMSE goes Down At±T, then BACK-Up! 27
28 Example: Sinusoidal Data FLEXIBLE Enough MODEL / OUERFITT HAS STARTED ' ng 28
29 Example: Sinusoidal Data Revisit Polynomial Regression What causes those wiggles that are hurting us so bad? 29
30 Example: Sinusoidal Data True Model is f(x) =sin( x)+ 30
31 Example: Sinusoidal Data True Model is f(x) =sin( x)+ 31
32 Example: Sinusoidal Data True Model is f(x) =sin( x)+ 32
33 Example: Sinusoidal Data True Model is f(x) =sin( x)+ 33
34 Example: Sinusoidal Data Coefficients grow very large. Need a way to control them. Regulate them even 34
35 Regularization Goal: Keep model coefficients from growing too large Idea: Put something in the loss function to dissuade them growing too much minimize RSS = nx [( x i1 + + p x ip ) y i ] 2 + i=1 o Regularization weight of is a tuning parameter px k=1 2 k Tenafly o Have to do regularization study to choose best value 35
36 Regularization Goal: Keep model coefficients from growing too large Idea: Put something in the loss function to dissuade them growing too much RSS = nx [( x i1 + + p x ip ) y i ] 2 + i=1 px k=1 2 k o Why do we not regularize the bias parameter? BIAS Tells Us About Average RESPONSE ( HEIGHT OF DATA ). Regularizing po pulls THE whole model down FROM WHERE Should BE 36
37 Ridge Regression Goal: Keep model coefficients from growing too large Idea: Put something in the loss function to dissuade them growing too much RSS = nx [( x i1 + + p x ip ) y i ] 2 + i=1 px k=1 2 k o This form of penalty term is called Ridge Regression. But there are many more 37
38 Sinusoidal Data with Ridge Regression Watch the coefficients 38
39 Sinusoidal Data with Ridge Regression Watch the coefficients 39
40 Sinusoidal Data with Ridge Regression Watch the coefficients 40
41 Sinusoidal Data with Ridge Regression Watch the coefficients 41
42 Sinusoidal Data with Ridge Regression Watch the coefficients 42
43 Sinusoidal Data with Ridge Regression Watch the coefficients NOTE : COEFFICIENTS STAYED Small! 43
44 Example: Sinusoidal Data No Regularization Ridge Regression BLOW - UP HAPPENS MUCH LATER! 44
45 Example: Sinusoidal Data Regularization Study: o Choose poly degree o Vary over reg strength o Look for best validation MSE 45
46 Example: Sinusoidal Data Regularization Study: o Choose poly degree o Vary over reg strength o Look at size of coefficients COEFFS As SHRINK oo 46
47 Regularization Wrap-Up You should always do regularization. If you choose carefully, it will always help Generalization Next Time: o Talk more about Ridge Regression Details o Learning Curves and what they tell us about the Bias-Variance Trade-Off 47
48 48
49 49
50 50
51 51
Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationMultiple Linear Regression for the Supervisor Data
for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationRidge Regression: Regulating overfitting when using many features
Ridge Regression: Regulating overfitting when using man features Emil Fox Universit of Washington April 10, 2018 Training, true, & test error vs. model complexit Overfitting if: Error Model complexit 2
More informationCS395T: Structured Models for NLP Lecture 2: Machine Learning Review
CS395T: Structured Models for NLP Lecture 2: Machine Learning Review Greg Durrett Some slides adapted from Vivek Srikumar, University of Utah Administrivia Course enrollment Lecture slides posted on website
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationRidge Regression: Regulating overfitting when using many features. Training, true, & test error vs. model complexity. CSE 446: Machine Learning
Ridge Regression: Regulating overfitting when using many features Emily Fox University of Washington January 3, 207 Training, true, & test error vs. model complexity Overfitting if: Error y Model complexity
More informationLecture 5: A step back
Lecture 5: A step back Last time Last time we talked about a practical application of the shrinkage idea, introducing James-Stein estimation and its extension We saw our first connection between shrinkage
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationRegression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.
Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationLecture 3: Just a little more math
Lecture 3: Just a little more math Last time Through simple algebra and some facts about sums of normal random variables, we derived some basic results about orthogonal regression We used as our major
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationRirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology
Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign
More informationMath 651 Introduction to Numerical Analysis I Fall SOLUTIONS: Homework Set 1
ath 651 Introduction to Numerical Analysis I Fall 2010 SOLUTIONS: Homework Set 1 1. Consider the polynomial f(x) = x 2 x 2. (a) Find P 1 (x), P 2 (x) and P 3 (x) for f(x) about x 0 = 0. What is the relation
More information5 DAY YOUR BUSINESS CHALLENGE WORKBOOK
5 DAY YOUR BUSINESS CHALLENGE WORKBOOK Fire Up Your Business So you ve started a business and are passionate about cultivating this gig into your main income to create a life of freedom, flexibility and
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationMachine Learning and Data Mining. Linear regression. Prof. Alexander Ihler
+ Machine Learning and Data Mining Linear regression Prof. Alexander Ihler Supervised learning Notation Features x Targets y Predictions ŷ Parameters θ Learning algorithm Program ( Learner ) Change µ Improve
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationCPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016
CPSC 340: Machine Learning and Data Mining Gradient Descent Fall 2016 Admin Assignment 1: Marks up this weekend on UBC Connect. Assignment 2: 3 late days to hand it in Monday. Assignment 3: Due Wednesday
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationGravity and Orbits. 1. Choose the picture you think shows the gravity forces on the Earth and the Sun.
Name: Grade: Gravity and Orbits Pre-lab 1. Choose the picture you think shows the gravity forces on the Earth and the Sun. (a longer arrow to represents a big force, and a shorter arrow represent a smaller
More informationMS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015
MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates
More informationGravity and Orbits Activity Page 1. Name: Grade: Gravity and Orbits. Pre-lab. 1. In the picture below, draw how you think Earth moves.
Name: Grade: Gravity and Orbits Pre-lab 1. In the picture below, draw how you think Earth moves. 2. Draw a picture using arrows to show what you think the forces might be on the Earth and the Sun. You
More informationLinear classifiers Lecture 3
Linear classifiers Lecture 3 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin ML Methodology Data: labeled instances, e.g. emails marked spam/ham
More informationProbability Distributions
CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas
More informationSequences and infinite series
Sequences and infinite series D. DeTurck University of Pennsylvania March 29, 208 D. DeTurck Math 04 002 208A: Sequence and series / 54 Sequences The lists of numbers you generate using a numerical method
More informationReinforcement Learning Wrap-up
Reinforcement Learning Wrap-up Slides courtesy of Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationLIMITS AT INFINITY MR. VELAZQUEZ AP CALCULUS
LIMITS AT INFINITY MR. VELAZQUEZ AP CALCULUS RECALL: VERTICAL ASYMPTOTES Remember that for a rational function, vertical asymptotes occur at values of x = a which have infinite its (either positive or
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More informationTerminology for Statistical Data
Terminology for Statistical Data variables - features - attributes observations - cases (consist of multiple values) In a standard data matrix, variables or features correspond to columns observations
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationRegression Analysis IV... More MLR and Model Building
Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression
More informationMS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari
MS&E 226: Small Data Lecture 6: Bias and variance (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 47 Our plan today We saw in last lecture that model scoring methods seem to be trading o two di erent
More informationCOS 424: Interacting with Data
COS 424: Interacting with Data Lecturer: Rob Schapire Lecture #14 Scribe: Zia Khan April 3, 2007 Recall from previous lecture that in regression we are trying to predict a real value given our data. Specically,
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 02: Linear Regression 1 I d rather die than telling you my password! Transfer success! 2 Outline Announcements Linear Regression Regularization Cross-validation 3 Outline
More informationLecture 2: Linear regression
Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued
More informationMATH 320, WEEK 11: Eigenvalues and Eigenvectors
MATH 30, WEEK : Eigenvalues and Eigenvectors Eigenvalues and Eigenvectors We have learned about several vector spaces which naturally arise from matrix operations In particular, we have learned about the
More informationMachine Learning & Data Mining CMS/CS/CNS/EE 155. Lecture 2: Perceptron & Gradient Descent
Machine Learning & Data Mining CMS/CS/CNS/EE 155 Lecture 2: Perceptron & Gradient Descent Announcements Homework 1 is out Due Tuesday Jan 12 th at 2pm Via Moodle Sign up for Moodle & Piazza if you haven
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationSample questions for Fundamentals of Machine Learning 2018
Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationPartial Fractions. June 27, In this section, we will learn to integrate another class of functions: the rational functions.
Partial Fractions June 7, 04 In this section, we will learn to integrate another class of functions: the rational functions. Definition. A rational function is a fraction of two polynomials. For example,
More informationCSC 411 Lecture 6: Linear Regression
CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression
More informationAdvanced Placement Physics C Summer Assignment
Advanced Placement Physics C Summer Assignment Summer Assignment Checklist: 1. Book Problems. Selected problems from Fundamentals of Physics. (Due August 31 st ). Intro to Calculus Packet. (Attached) (Due
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationSTATISTICS 1 REVISION NOTES
STATISTICS 1 REVISION NOTES Statistical Model Representing and summarising Sample Data Key words: Quantitative Data This is data in NUMERICAL FORM such as shoe size, height etc. Qualitative Data This is
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More information...but you just watched a huge amount of science taking place! Let s look a little closer at what just happened!
Chapter 2: Page 10 Chapter 2: Page 11 Almost everyone gets into a car every day. Let s think about some of the steps it takes to drive a car. The first thing you should do, of course, is put on your seat
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationHigh-Dimensional Statistical Learning: Introduction
Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/
More informationMath 1553 Introduction to Linear Algebra. School of Mathematics Georgia Institute of Technology
Math 1553 Introduction to Linear Algebra School of Mathematics Georgia Institute of Technology Resources There are links from T-Square. Everything is on the course web page. Master Webpage Section J How
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationLecture 10: F -Tests, ANOVA and R 2
Lecture 10: F -Tests, ANOVA and R 2 1 ANOVA We saw that we could test the null hypothesis that β 1 0 using the statistic ( β 1 0)/ŝe. (Although I also mentioned that confidence intervals are generally
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationChapter 27 Summary Inferences for Regression
Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationCPSC 340: Machine Learning and Data Mining. Regularization Fall 2017
CPSC 340: Machine Learning and Data Mining Regularization Fall 2017 Assignment 2 Admin 2 late days to hand in tonight, answers posted tomorrow morning. Extra office hours Thursday at 4pm (ICICS 246). Midterm
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationTufts COMP 135: Introduction to Machine Learning
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)
More informationBias-Variance Trade-off in ML. Sargur Srihari
Bias-Variance Trade-off in ML Sargur srihari@cedar.buffalo.edu 1 Bias-Variance Decomposition 1. Model Complexity in Linear Regression 2. Point estimate Bias-Variance in Statistics 3. Bias-Variance in Regression
More informationLow Bias Bagged Support Vector Machines
Low Bias Bagged Support Vector Machines Giorgio Valentini Dipartimento di Scienze dell Informazione Università degli Studi di Milano, Italy valentini@dsi.unimi.it Thomas G. Dietterich Department of Computer
More informationFundamentals of Machine Learning
Fundamentals of Machine Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://icapeople.epfl.ch/mekhan/ emtiyaz@gmail.com Jan 2, 27 Mohammad Emtiyaz Khan 27 Goals Understand (some) fundamentals of Machine
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationChapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.
Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationUniversity of Maryland Department of Physics
Spring 3 University of Maryland Department of Physics Laura Lising Physics 1 March 6, 3 Exam #1 nswer all questions on these sheets. Please write clearly and neatly: We can only give you credit for what
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationESS2222. Lecture 3 Bias-Variance Trade-off
ESS2222 Lecture 3 Bias-Variance Trade-off Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Bias-Variance Trade-off Overfitting & Regularization Ridge & Lasso Regression Nonlinear
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationLinear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?
Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation
More informationLearning theory Lecture 4
Learning theory Lecture 4 David Sontag New York University Slides adapted from Carlos Guestrin & Luke Zettlemoyer What s next We gave several machine learning algorithms: Perceptron Linear support vector
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationCalculus: What is a Limit? (understanding epislon-delta proofs)
Calculus: What is a Limit? (understanding epislon-delta proofs) Here is the definition of a limit: Suppose f is a function. We say that Lim aa ff() = LL if for every εε > 0 there is a δδ > 0 so that if
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationCSC321 Lecture 4 The Perceptron Algorithm
CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:
More informationMA 1125 Lecture 33 - The Sign Test. Monday, December 4, Objectives: Introduce an example of a non-parametric test.
MA 1125 Lecture 33 - The Sign Test Monday, December 4, 2017 Objectives: Introduce an example of a non-parametric test. For the last topic of the semester we ll look at an example of a non-parametric test.
More informationCOMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation
COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted
More information