UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia
|
|
- Preston Stevenson
- 5 years ago
- Views:
Transcription
1 UVA CS 4501: Machine Learning Lecture 6: Linear Regression Model with Dr. Yanjun Qi University of Virginia Department of Computer Science
2 Where are we? è Five major of this course q Regression (supervised) q Classifica@on (supervised) q Unsupervised models q Learning theory q Graphical models 2
3 Today è Regression (supervised) q Four ways to train / perform op@miza@on for linear regression models q Normal Equa@on q Gradient Descent (GD) q Stochas@c GD q Newton s method q Supervised regression models q Linear regression (LR) q LR with non-linear basis func@ons q Locally weighted LR q LR with Regulariza@ons 3
4 Today q Linear Regression Model with Regulariza@ons ü Review: (Ordinary) Least squares: squared loss (Normal Equa@on) ü Ridge regression: squared loss with L2 regulariza@on ü Lasso regression: squared loss with L1 regulariza@on ü Elas@c regression: squared loss with L1 AND L2 regulariza@on ü WHY and Influence of Regulariza@on Parameter 4
5 A Dataset for regression valued variable Data/points/instances/examples/samples/records: [ rows ] Features/a0ributes/dimensions/independent variables/covariates/ predictors/regressors: [ columns, except the last] Target/outcome/response/label/dependent variable: special column to be predicted [ last column ] 5
6 Dr. Yanjun Qi / SUPERVISED Regression Training dataset consists of inputoutput pairs When, target Y is a con@nuous target variable f(x? ) 6
7 Review: Normal for LR Write the cost in matrix form: J(β)= 1 2 n i=1 = 1 2 Xβ! y (x i T β y i ) 2 ( ) T ( Xβ y! ) = 1 2 β T X T Xβ β T X T!! y y T Xβ + y! T! y $ T ( ) x n ' # & To minimize J(θ), take deriva@ve and set to zero: X T Xβ = X T! y The normal equa@ons β * = ( X T X ) 1 X T! y X = " $ $ $ $ x 1 T x 2 T!!! % ' ' ' '! # # Y = # # "# Assume that X T X is inver@ble y 1 y 2! y n $ & & & & %&
8 Comments on the normal What if X has less than full column rank? à Not Inver@ble à Add Regulariza@on
9 Page11 0f Handout 9
10 e.g. A Prac@cal Applica@on of Regression Model Proceedings of HLT 2010 Human Language Technologies: 10
11 e.g., Movie Reviews and Revenues: An Experiment in Text Regression, Proceedings of HLT '10 (1.7k n / >3k features) e.g. counts of a ngram in the text 11
12 e.g., QSAR: Drug Screening p p Binding to Thrombin (DuPont Pharmaceuticals) - n: 2543 compounds tested for their ability to bind to a target site on thrombin (a human enzyme) - p: 139,351 binary features, which describe three-dimensional properties of molecules. n X Weston et al, Bioinformatics,
13 Today q Linear Regression Model with Regulariza@ons ü Review: (Ordinary) Least squares: squared loss (Normal Equa@on) ü Ridge regression: squared loss with L2 regulariza@on ü Lasso regression: squared loss with L1 regulariza@on ü Elas@c regression: squared loss with L1 AND L2 regulariza@on ü Influence of Regulariza@on Parameter 13
14 Review: Vector norms A norm of a vector x is informally a measure of the length of the vector. Common norms: L 1, L 2 (Euclidean) L infinity 14
15 eview: Vector Norm (L2, when p=2) 15
16 Norms Lasso
17 Ridge Regression / L2 Regularization β * = ( X T X ) 1! X T y If not inver@ble, a classical solu@on is to add a small posi@ve element to diagonal β * = ( X T X + λi) 1! X T y By conven@on, the bias/intercept term is typically not regularized. Here we assume data has been centered therefore no bias term 17
18 Definite Matrix One important property of definite matrices is that è They are always full rank, and hence, Extra: See Proof at Page of Linear-Algebra Handout è 18
19 β * = ( X T X + λi) 1! X T y 19
20 Ridge Regression / Squared Loss+L2 β * = ( X T X + λi) 1! X T y As the solution from HW2 β! ridge = argmin( y Xβ) T ( y Xβ)+ λβ T β to minimize, take deriva@ve and set to zero By conven@on, the bias/intercept term is typically not regularized. Here we assume data has been centered therefore no bias term 20
21 Ridge Regression / Squared Loss+L2 β * = ( X T X + λi) 1! X T y As the solution from HW2 β! ridge = argmin( y Xβ) T ( y Xβ)+ λβ T β to minimize, take deriva@ve and set to zero Equivalently β! ridge = argmin( y Xβ) T ( y Xβ) subject to β 2 j s 2 j={1..p} By conven@on, the bias/intercept term is typically not regularized. Here we assume data has been centered therefore no bias term 21
22 s Contour lines from Ridge Regression β! ridge = argmin( y Xβ) T ( y Xβ) subject to β 2 j s 2 j={1..p} s Least Square solu@on 22
23 s Contour lines from Ridge Regression Ridge Regression Least Square s 23
24 Least Square+L2: Ridge 2 s 1 24
25 Ridge Regression / Squared Loss+L2 > 0 penalizes each λ θ j See whiteboard if = 0 we get the least squares estimator; λ if λ, then θ j 0 25
26 Parameter Shrinkage β OLS = ( X T X ) 1 X T! y β Rg = ( X T X + λi) 1 X T! y When When Page65 of ESL hqp://statweb.stanford.edu/~@bs/ ElemStatLearn/prin@ngs/ESLII_print10.pdf 26
27 ü Influence of Parameter
28 ü Influence of Parameter λ 1 28
29 ü Influence of Parameter 2 λ
30 Today q Linear Regression Model with Regulariza@ons ü Review: (Ordinary) Least squares: squared loss (Normal Equa@on) ü Ridge regression: squared loss with L2 regulariza@on ü Lasso regression: squared loss with L1 regulariza@on ü Elas@c regression: squared loss with L1 AND L2 regulariza@on ü WHY and Influence of Regulariza@on Parameter 30
31 (2) Lasso (least absolute shrinkage and selection operator) / Squared Loss+L1 The lasso is a shrinkage method like ridge, but acts in a nonlinear manner on the outcome y. The lasso is defined by ˆβ lasso = argmin( y X β) T ( y X β) subject to β s j n ( y i x it β) 2 i=1 By conven@on, the bias/intercept term is typically not regularized. Here we assume data has been centered therefore no bias term 31
32 Lasso (least absolute shrinkage and selection operator) Suppose in 2 dimension β= (β 1, β 2 ) β 1 + β 2 =const β β 2 =const -β 1 + β 2 =const -β 1 + -β 2 =const Lasso Solu@on s Least Square solu@on 32
33 In the Figure, the has eliminated the role of x2, leading to sparsity Lasso Least Square s 33
34 Lasso Ridge Regression s s 34
35 Lasso (least absolute shrinkage and selection operator) Notice that ridge penalty by β j β 2 j is replaced Due to the nature of the constraint, if tuning parameter is chosen small enough, then the lasso will set some coefficients exactly to zero. 35
36 Lasso for when p>n accuracy and model are two important aspects of regression models. LASSO does shrinkage and variable simultaneously for beqer and model Disadvantage: -In p>n case, lasso selects at most n variable before it saturates -If there is a group of variables among which the pairwise correla@ons are very high, then lasso select one from the group 36
37 Lasso: Implicit Feature p p n X 37
38 38
39 Today q Linear Regression Model with Regulariza@ons ü Review: (Ordinary) Least squares: squared loss (Normal Equa@on) ü Ridge regression: squared loss with L2 regulariza@on ü Lasso regression: squared loss with L1 regulariza@on ü Elas@c regression: squared loss with L1 AND L2 regulariza@on ü WHY and Influence of Regulariza@on Parameter 39
40 (3) Hybrid of Ridge and Lasso : Elas@c Net regulariza@on L1 part of the penalty generates a sparse model L2 part of the penalty: Remove the limita@on of the number of selected variables Encouraging group effect Stabilize the L1 regulariza@on path. 40
41 Naïve net For any non fixed λ 1 and λ 2, naive elas@c net criterion: The naive elas@c net es@mator is the minimizer of equa@on Let 41
42 Geometry of net 42
43 Movie Reviews and Revenues: An Experiment in Text Regression, Proceedings of HLT '10 Human Language Technologies: Use linear regression to directly predict the opening weekend gross earnings, denoted y, based on features x extracted from the movie metadata and/or the text of the reviews. 43
44 Advantage of net (Extra) set can be converted to lasso with augmented data form In the augmented sample size n+p and X * has rank p è can poten@ally select all the predictors Naïve elas@c net can perform automa@c variable selec@on like lasso 44
45 Regularized multivariate linear regression Task Regression Representation Score Function Y = Weighted linear sum of X s Least-squares + Regularization Search/Optimization Models, Parameters Linear algebra for Ridge / sub-gd for Lasso & Elastic Regression coefficients (regularized weights) min J(β) = Y Y^ + λ( β q j ) 1/q 45 n i=1 2 p j=1
46 Today q Linear Regression Model with Regulariza@ons ü Review: (Ordinary) Least squares: squared loss (Normal Equa@on) ü Ridge regression: squared loss with L2 regulariza@on ü Lasso regression: squared loss with L1 regulariza@on ü Elas@c regression: squared loss with L1 AND L2 regulariza@on ü WHY and Influence of Regulariza@on Parameter 46
47 Why to Use simpler models? Because: Simpler to use (lower complexity) Easier to train (needs less examples) Less to noise Easier to explain (more interpretable) Generalizes beqer (lower variance - Occam s razor) --- More in future lectures!!! Adapted from Dr. Christoph Eick Slides 47
48 Model & Generalisation: learn / hypothesis from past data in order to explain, predict, model or control new data examples Underfi}ng: when model is too simple, both training and test errors are large Overfi}ng: when model is too complex and test errors are large although training errors are small. A~er learning knowledge, model tends to learn noise 48
49 Issue: Overfi}ng and underfi}ng Under fit Looks good Over fit y =θ 0 + θ1x y 2 5 = θ 0 + θ1x + θ2x = j = 0 y θ j x j Generalisation: learn func@on / hypothesis from past data in order to explain, predict, model or control new data examples K-fold Cross Valida@on!!!!
50 Overfi}ng and. Underfi}ng Overfit 50
51 Overfi}ng A regularizer is an criteria to the loss to make sure that we don t overfit It s called a regularizer since it tries to keep the parameters more normal/regular It is a bias on the model forces the learning to prefer certain types of weights over others, e.g.,! β ridge = argmin n β ( y i x T i β) 2 i=1 + λβ T β
52 Common regularizers L2: Squared weights penalizes large values more L1: Sum of weights will penalize small values more j β j β 2 j j Generally, we don t want huge weights If weights are large, a small change in a feature can result in a large change in the predic@on Might also prefer weights of 0 for features that aren t useful
53 Summary: Regularized multivariate linear regression Model: LR LASSO ^ Y Ridge regression es@ma@on: ^ ^ = β + β x +! + β ^ p xp ^ argmin Y Y 2 n ^ p argmin Y Y + λ β j i=1 j=1 n ^ p argmin Y Y + λ 2 β j i=1 2 2 j=1 53/54
54 More: A family of shrinkage estimators β = argmin N β ( y i x T i β) 2 i=1 subject to β j q s for q >=0, contours of constant value of are shown for the case of two inputs. j β j q 54
55 norms visualized all p-norms penalize larger weights q q < 2 tends to create sparse (i.e. lots of 0 weights) q > 2 tends to push for similar weights
56 path of a Ridge Regression λ λ = 0 56
57 path of a Ridge Regression Weight Decay λ λ = 0 An example with 8 features 57
58 path of a Lasso Regression when varying λ, how β j varies. λ λ = 0 An example with 8 features 58
59 WHY and How to Select λ? 1. ability è k-folds CV to decide 2. Control the bias and Variance of the model (details in future lectures) 59
60 Regression: Complexity versus Goodness of Fit y Training data y Too simple? x x Low Variance / High Bias y Too complex? About right? y Low Bias / High Variance x What ultimately matters: GENERALIZATION 60 x
61 An example Choose λ that generalizes well! of Ridge Regression when varying λ, how β j varies. λ λ = 0 λ increases An example with 8 features 61
62 Choose λ that generalizes well! when varying λ, how β j varies. λ λ = 0 An example with 8 features 62
63 Today Recap q Linear Regression Model with Regulariza@ons ü Review: (Ordinary) Least squares: squared loss (Normal Equa@on) ü Ridge regression: squared loss with L2 regulariza@on ü Lasso regression: squared loss with L1 regulariza@on ü Elas@c regression: squared loss with L1 AND L2 regulariza@on ü Influence of Regulariza@on Parameter 63
64 References q Big thanks to Prof. Eric CMU for allowing me to reuse some of his slides q Prof. Nando de Freitas s tutorial slide q Regulariza@on and variable selec@on via the elas@c net, Hui Zou and Trevor Has@e, Stanford University, USA q ESL book: Elements of StaDsDcal Learning 64
65 More of regularized regression: See L6-extra slide between λ and s See L6-extra slide Why Elas@c Net has a few nice proper@es See L6-extra slide 65
UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 6: Linear Regression Model with RegularizaEons. Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 6: Linear Regression Model with RegularizaEons Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secgons
More informationLast Lecture Recap. UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 6: Regression Models with Regulariza8on
UVA CS 45 - / 65 7 Introduc8on to Machine Learning and Data Mining Lecture 6: Regression Models with Regulariza8on Yanun Qi / Jane University of Virginia Department of Computer Science Last Lecture Recap
More informationLast Lecture Recap UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 3: Linear Regression
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 3: Linear Regression Yanjun Qi / Jane University of Virginia Department of Computer Science 1 Last Lecture Recap q Data
More informationUVA CS 4501: Machine Learning. Lecture 6 Extra: Op=miza=on for Linear Regression Model with Regulariza=ons. Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 6 Extra: Op=miza=on for Linear Regression Model with Regulariza=ons Dr. Yanjun Qi University of Virginia Department of Computer Science EXTRA (NOT REQUIRED IN EXAMS)
More informationUVA CS 4501: Machine Learning. Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce. Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major
More informationMachine Learning and Data Mining. Linear regression. Prof. Alexander Ihler
+ Machine Learning and Data Mining Linear regression Prof. Alexander Ihler Supervised learning Notation Features x Targets y Predictions ŷ Parameters θ Learning algorithm Program ( Learner ) Change µ Improve
More informationUVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 7: Regression Models - Review HW1 DUE NOW / HW2 OUT TODAY 9/18/14
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 7: Regression Models - Review Yanjun Qi / Jane University of Virginia Department of Computer Science 1 HW1 DUE NOW / HW2
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationRegression.
Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets
More informationPriors in Dependency network learning
Priors in Dependency network learning Sushmita Roy sroy@biostat.wisc.edu Computa:onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationUVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 9: Classifica8on with Support Vector Machine (cont.
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 9: Classifica8on with Support Vector Machine (cont.) Yanjun Qi / Jane University of Virginia Department of Computer Science
More informationUVA CS / Introduc8on to Machine Learning and Data Mining
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationBias/variance tradeoff, Model assessment and selec+on
Applied induc+ve learning Bias/variance tradeoff, Model assessment and selec+on Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège October 29, 2012 1 Supervised
More informationLogis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationRidge Regression 1. to which some random noise is added. So that the training labels can be represented as:
CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationLINEAR REGRESSION, RIDGE, LASSO, SVR
LINEAR REGRESSION, RIDGE, LASSO, SVR Supervised Learning Katerina Tzompanaki Linear regression one feature* Price (y) What is the estimated price of a new house of area 30 m 2? 30 Area (x) *Also called
More informationCOMP 562: Introduction to Machine Learning
COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationLinear Regression with mul2ple variables. Mul2ple features. Machine Learning
Linear Regression with mul2ple variables Mul2ple features Machine Learning Mul4ple features (variables). Size (feet 2 ) Price ($1000) 2104 460 1416 232 1534 315 852 178 Mul4ple features (variables). Size
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 16 Extra: Support Vector Machine Optimization with Dual Dr. Yanjun Qi University of Virginia Department of Computer Science Today Extra q Optimization of SVM ü SVM
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationRegression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.
Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationCSE546: SVMs, Dual Formula5on, and Kernels Winter 2012
CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 Luke ZeClemoyer Slides adapted from Carlos Guestrin Linear classifiers Which line is becer? w. = j w (j) x (j) Data Example i Pick the one with the
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationLeast Mean Squares Regression. Machine Learning Fall 2017
Least Mean Squares Regression Machine Learning Fall 2017 1 Lecture Overview Linear classifiers What func?ons do linear classifiers express? Least Squares Method for Regression 2 Where are we? Linear classifiers
More informationClassifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University
Classifica(on and predic(on omics style Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University Classifica(on Learning Set Data with known classes Prediction Classification rule Data with unknown
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationCSC 411 Lecture 6: Linear Regression
CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More informationNatural Language Processing with Deep Learning. CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Christopher Manning and Richard Socher Classifica6on setup and nota6on Generally
More informationRegularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018
1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationLinear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3.
School of Computer Science 10-701 Introduction to Machine Learning Linear Regression Readings: Bishop, 3.1 Murphy, 7 Matt Gormley Lecture 4 September 19, 2016 1 Homework 1: due 9/26/16 Project Proposal:
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationWhere are we? è Five major sec3ons of this course
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 12: Probability and Sta3s3cs Review Yanjun Qi / Jane University of Virginia Department of Computer Science 10/02/14 1
More informationSupport Vector Machine I
Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW
More informationRegulatory Inferece from Gene Expression. CMSC858P Spring 2012 Hector Corrada Bravo
Regulatory Inferece from Gene Expression CMSC858P Spring 2012 Hector Corrada Bravo 2 Graphical Model Let y be a vector- valued random variable Suppose some condi8onal independence proper8es hold for some
More informationRegression. Mark Craven and David Page Computer Sciences 760 Spring Goals for the lecture
Regression Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760 Goals for the lecture you should understand the following concepts linear regression RMSE, MAE,
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph
More informationFounda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang
Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #3 Machine Learning Edward Y. Chang Edward Chang Founda'ons of LSMM 1 Edward Chang Foundations of LSMM 2 Machine Learning
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 1 Evalua:on
More informationLecture 2: Linear regression
Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued
More informationHomework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn
Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn CMU 10-701: Machine Learning (Fall 2016) https://piazza.com/class/is95mzbrvpn63d OUT: September 13th DUE: September
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationCompressed Sensing in Cancer Biology? (A Work in Progress)
Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationLinear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationHigh-Dimensional Statistical Learning: Introduction
Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationOutline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren
1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationLeast Square Es?ma?on, Filtering, and Predic?on: ECE 5/639 Sta?s?cal Signal Processing II: Linear Es?ma?on
Least Square Es?ma?on, Filtering, and Predic?on: Sta?s?cal Signal Processing II: Linear Es?ma?on Eric Wan, Ph.D. Fall 2015 1 Mo?va?ons If the second-order sta?s?cs are known, the op?mum es?mator is given
More informationSTAD68: Machine Learning
STAD68: Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 1 Evalua;on 3 Assignments worth 40%. Midterm worth 20%. Final
More informationParameter Norm Penalties. Sargur N. Srihari
Parameter Norm Penalties Sargur N. srihari@cedar.buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationMachine Learning for Biomedical Engineering. Enrico Grisan
Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Curse of dimensionality Why are more features bad? Redundant features (useless or confounding) Hard to interpret and
More information