UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$5:$NonALinear$Regression$ Models$

Size: px
Start display at page:

Download "UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$5:$NonALinear$Regression$ Models$"

Transcription

1 UVACS6316 Fall15Graduate: MachineLearning Lecture5:NonALinearRegression Models 9/1/15 Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience 1 Whereweare?! FivemajorsecHonsofthiscourse " Regression(supervised) " ClassificaHon(supervised) " Unsupervisedmodels " Learningtheory " Graphicalmodels 9/1/15

2 Today! Regression(supervised) " Fourwaystotrain/performopHmizaHonforlinear regressionmodels " NormalEquaHon " GradientDescent(GD) " StochasHcGD " Newton smethod " Supervisedregressionmodels " Linearregression(LR) " LRwithnonWlinearbasisfuncHons " LocallyweightedLR " LRwithRegularizaHons 9/1/15 3 Today " MachineLearningMethodinanutshell " RegressionModelsBeyondLinear LRwithnonWlinearbasisfuncHons Locallyweightedlinearregression RegressiontreesandMulHlinear InterpolaHon(later) 9/1/15 4

3 TradiFonalProgramming Data Program Computer Output MachineLearning Data Output Computer Program 9/1/15 5 Machine Learning in a Nutshell Task Representation Score Function Search/Optimization Models, Parameters MLgrewoutof workinai Op#mize(a( performance(criterion( using(example(data(or( past(experience,(( ( Aiming(to(generalize(to( unseen(data(( 9/1/15 6

4 (1) Multivariate Linear Regression Task Regression Representation Score Function Y = Weighted linear sum of X s Least-squares Search/Optimization Linear algebra / GD / SGD Models, Parameters Regression coefficients ŷ = f (x) =θ +θ 1 x 1 +θ x 9/1/15 7 Today " MachineLearningMethodinanutshell " RegressionModelsBeyondLinear LRwithnonWlinearbasisfuncHons Locallyweightedlinearregression RegressiontreesandMulHlinearInterpolaHon (later) " LinearRegressionModelwithRegularizaHons " RidgeRegression " LassoRegression 9/1/15 8

5 LRwithnonWlinearbasisfuncHons LRdoesnotmeanwecanonlydealwith linearrelahonships m! y =θ + θ ϕ (x) =θ T ϕ(x) j=1 j j Wearefreetodesign(nonWlinear)features (e.g.,basisfunchonderived)underlr wherethearefixedbasisfunchons(also define). E.g.:polynomialregression: 9/1/15! ϕ (x) j! ϕ (x)= 1! ϕ(x):= 1,x,x,x 3 T 9 e.g.(1)polynomialregression IntroducebasisfuncHons 9/1/15 θ * = ( ϕ T ϕ) 1 ϕ T! y 1 Dr.NandodeFreitas stutorialslide

6 e.g.(1)polynomialregression 9/1/15 KEY:ifthebasesaregiven,theproblemof learningtheparametersisshlllinear. 11 ManyPossibleBasisfuncHons TherearemanybasisfuncHons,e.g.: Polynomial RadialbasisfuncHons ϕ j (x) = x j 1 ( x µ ) φ ( x) x µ j Sigmoidal φ j ( x) = σ s j exp j = s 9/1/15 Splines, Fourier, Wavelets,etc 1

7 e.g.()lrwithradialwbasisfunchons E.g.:LRwithRBFregression: m!ŷ =θ + θ ϕ (x) =ϕ(x) T θ j=1 j j! ϕ(x):= 1,K (x,r λ 1 1 ),K λ (x,r ),K λ3 (x,r 3 ),K λ 4 (x,r 4 ) θ * = ( ϕ T ϕ) 1 ϕ T! y T 9/1/15 13 RBF=radialAbasisfuncFon:afuncFonwhichdepends onlyontheradialdistancefromacentrepoint GaussianRBF! asdistancefromthecentrerincreases, theoutputoftherbfdecreases " K λ (x,r) = exp # (x r) λ % ' & 1Dcase Dcase 9/1/15 14

8 e.g.anotherlinearregressionwith 1DRBFbasisfuncHons (4predefinedcentresandwidth)! ϕ(x):= 1,K (x,r λ 1 1 ),K λ (x,r ),K λ3 (x,r 3 ),K λ 4 (x,r 4 ) θ * = ϕ T ϕ 15 9/1/15 ( ) 1 ϕ T! y T e.g.alrwith1drbfs (3predefinedcentresandwidth) 1DRBF Agerfit: 9/1/15 16

9 e.g.dgoodandbadrbfs AgoodDRBF TwobadDRBFs 9/1/15 17 Twomainissues: Learntheparameter\theta ϕ(x) AlmostthesameasLR,just!Xto LinearcombinaHonofbasisfuncHons(thatcanbe nonwlinear) Howtochoosethemodelorder,e.g. polynomialdegreeforpolynomialregression 9/1/15 18

10 Issue:Overfikngandunderfikng Underfit Looksgood Overfit y =θ + θ1x y 5 = θ + θ1x + θx = j = y θ j x j 9/1/15 Generalisation:learnfuncHon/ hypothesisfrompastdatainorder to explain, predict, model or control newdataexamples KWfoldCross ValidaHon!!!! 19 () Multivariate Linear Regression with basis Expansion Task Regression Representation Score Function Y = Weighted linear sum of (X basis expansion) Least-squares Search/Optimization Linear algebra Models, Parameters Regression coefficients m!ŷ =θ + θ ϕ (x) =ϕ(x) T θ j=1 j j 9/1/15

11 Today " MachineLearningMethodinanutshell " RegressionModelsBeyondLinear LRwithnonWlinearbasisfuncHons Locallyweightedlinearregression RegressiontreesandMulHlinear InterpolaHon(later) 9/1/15 1 Locallyweightedlinearregression Thealgorithm: Insteadofminimizing nowwefittominimize J(θ) = 1 n θ w i (x it θ y i ) i=1 Wheredow i 'scomefrom? w i = K(x i,x )= exp (x x i ) τ! wherex_isthequerypointforwhichwe'dliketoknowitscorresponding y #EssenHallyweputhigherweightson(thoseerrors from)trainingexamplesthatareclosetothequery pointx_(thanthosethatarefurtherawayfromthe 9/1/15 querypoint) n 1 T J ( θ ) = ( xi θ yi ) i= 1

12 3 Locally weighted regression aka locally weighted regression, locally linear regression, LOESS, K λ (x i, x ) linear_func(x)w>y! couldrepresent onlytheneighbor regionofx_ UseRBFfuncHonto pickout/emphasize theneighborregion ofx_! K λ (x i, x ) 9/1/15 4 Locally weighted linear regression 9/1/15

13 Dr. Yanjun Qi / UVA CS 6316 / f15 Locally weighted linear regression Separate weighted least squares at each target point x : min N α (x ),β (x ) i=1 K λ (x i, x )[y i α(x ) β(x )x i ]! ˆf (x )= ˆα(x )+ ˆβ(x )x 9/1/15 LEARNING of Locally weighted linear regression x! ˆf (x )= ˆα(x )+ ˆβ(x )x! Separate weighted least squares at each target point x 9/1/15 6

14 Locally weighted linear regression e.g.whenfor onlyone featurevariable Dr. Yanjun Qi / UVA CS 6316 / f15 Separate weighted least squares at each target N point x : min K ( x, x )[ y α( x ) β ( x ) x ] λ i i α ( x ), β ( x ) i= 1 f ˆ ( x ) =α ˆ( x) + ˆ( β x) b(x) T =(1,x); B: Nx regression matrix with i-th row b(x) T ; x ( K ( x, x )), i = 1, N WN N ( x ) = diag λ i, i LWR 9/1/15 versus ˆf (x ) = b(x ) T (B T W (x )B) 1 B T W (x )y LR ˆf (x q ) = (x q ) T θ * = (x q ) T ( X T X) 1 X T! y More! Local Weighted Polynomial Regression Local polynomial fits of any degree d min N K ( x, x ) y α( x ) d λ i i α ( x ), β j ( x ), j= 1,, d i= 1 j= 1 d j fˆ( x) = ˆ( α x) + ˆ β j= j ( x x 1 ) Blue:true Green:esHmated β ( x ) x j j i 9/1/15

15 Parametricvs.nonWparametric LocallyweightedlinearregressionisanonAparametric algorithm. The(unweighted)linearregressionalgorithmthatwesaw earlierisknownasaparametriclearningalgorithm becauseithasafixed,finitenumberofparameters(the),which θ arefittothedata; Oncewe'vefitthe\thetaandstoredthemaway,wenolongerneed tokeepthetrainingdataaroundtomakefuturepredichons. Incontrast,tomakepredicHonsusinglocallyweightedlinear regression,weneedtokeeptheenhretrainingsetaround. Theterm"nonWparametric"(roughly)referstothefactthatthe amountofknowledgeweneedtokeep,inordertorepresent thehypothesisgrowswithlinearlythesizeofthetrainingset. 9/1/15 9 (3) Locally Weighted / Kernel Linear Regression Task Regression Representation Score Function Y = Weighted linear sum of X s Weighted Least-squares Search/Optimization Linear algebra Models, Parameters min N Local Regression coefficients (conditioned on each test point) K λ (x i, x )[y i α(x ) β(x )x i ] α (x ),β (x ) i=1 ˆ 9/1/15 f ( x ) =α ˆ( x) + ˆ( β x) x 3

16 TodayRecap " MachineLearningMethodinanutshell " RegressionModelsBeyondLinear LRwithnonWlinearbasisfuncHons Locallyweightedlinearregression RegressiontreesandMulHlinear InterpolaHon(later) 9/1/15 31 ProbabilisHcInterpretaHonof LinearRegression(LATER) Letusassumethatthetargetvariableandtheinputsare relatedbytheequahon: T yi = θ xi + ε i whereεisanerrortermofunmodeledeffectsorrandomnoise Nowassumethatε(followsaGaussianN(,σ),thenwe have: T 1 ( yi θ x ) exp πσ σ i ( yi xi; θ) = p Byindependence(amongsamples)assumpHon: 9/1/15 n n n 1 i L( θ) = p( yi xi; θ) = exp i= πσ = 1 i 1 T ( y θ x ) σ ManymorevariaHons oflrfromthis perspechve,e.g. binomial/poisson (LATER) i 3

17 References allowingmetoreusesomeofhisslides " Prof.NandodeFreitas stutorialslide 9/1/15 33

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$4:$More$opBmizaBon$for$Linear$ Regression$

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$4:$More$opBmizaBon$for$Linear$ Regression$ Dr.YanjunQi/UVACS6316/f15 UVACS6316 Fall2015Graduate: MachineLearning Lecture4:MoreopBmizaBonforLinear Regression 9/14/15 Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience 1 Whereweare?! FivemajorsecHonsofthiscourse

More information

Announcements:!Rough!Plan!!

Announcements:!Rough!Plan!! UVA$CS$4501$+$001$/$6501$ $007$ Introduc8on$to$Machine$Learning$and$ Data$Mining$$ $ Lecture$26:History/Review$ YanjunQi/Jane,,PhD UniversityofVirginia Departmentof ComputerScience YanjunQi/UVACS4501O01O6501O07

More information

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$8:$Supervised$ClassificaDon$ with$support$vector$machine$

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$8:$Supervised$ClassificaDon$ with$support$vector$machine$ UVACS6316 Fall2015Graduate: MachineLearning Lecture8:SupervisedClassificaDon withsupportvectormachine 9/30/15 Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience 1 Whereweare?! FivemajorsecHonsofthiscourse

More information

Last Lecture Recap. UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 6: Regression Models with Regulariza8on

Last Lecture Recap. UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 6: Regression Models with Regulariza8on UVA CS 45 - / 65 7 Introduc8on to Machine Learning and Data Mining Lecture 6: Regression Models with Regulariza8on Yanun Qi / Jane University of Virginia Department of Computer Science Last Lecture Recap

More information

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$18:$Neural'Network'/'Deep' Learning$

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$18:$Neural'Network'/'Deep' Learning$ UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$18:$NeuralNetwork/Deep Learning$ Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience 1 Wherearewe?! FivemajorsecKonsofthiscourse

More information

Machine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io

Machine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io Machine Learning Lecture 2: Linear regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2017 Supervised Learning Regression: Predict

More information

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$17:$Decision(Tree(/(Random( Forest(/(Ensemble$

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$17:$Decision(Tree(/(Random( Forest(/(Ensemble$ Dr.YanjunQi/UVACS6316/f15 UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$17:$DecisionTree/Random Forest/Ensemble$ 10/28/15 Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience

More information

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 6: Linear Regression Model with RegularizaEons. Dr. Yanjun Qi. University of Virginia

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 6: Linear Regression Model with RegularizaEons. Dr. Yanjun Qi. University of Virginia UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 6: Linear Regression Model with RegularizaEons Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secgons

More information

Last Lecture Recap UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 3: Linear Regression

Last Lecture Recap UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 3: Linear Regression UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 3: Linear Regression Yanjun Qi / Jane University of Virginia Department of Computer Science 1 Last Lecture Recap q Data

More information

UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia

UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia UVA CS 4501: Machine Learning Lecture 6: Linear Regression Model with Regulariza@ons Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sec@ons of this course

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 7: Regression Models - Review HW1 DUE NOW / HW2 OUT TODAY 9/18/14

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 7: Regression Models - Review HW1 DUE NOW / HW2 OUT TODAY 9/18/14 UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 7: Regression Models - Review Yanjun Qi / Jane University of Virginia Department of Computer Science 1 HW1 DUE NOW / HW2

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3.

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3. School of Computer Science 10-701 Introduction to Machine Learning Linear Regression Readings: Bishop, 3.1 Murphy, 7 Matt Gormley Lecture 4 September 19, 2016 1 Homework 1: due 9/26/16 Project Proposal:

More information

Machine learning - HT Basis Expansion, Regularization, Validation

Machine learning - HT Basis Expansion, Regularization, Validation Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Intro to Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 3 Additive Models and Linear Regression Sinusoids and Radial Basis Functions Classification Logistic Regression Gradient Descent Polynomial Basis Functions

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 1, 2015

Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 1, 2015 Ridge Regression Mohammad Emtiyaz Khan EPFL Oct, 205 Mohammad Emtiyaz Khan 205 Motivation Linear models can be too limited and usually underfit. One way is to use nonlinear basis functions instead. Nonlinear

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Machine Learning. Part 1. Linear Regression. Machine Learning: Regression Case. .. Dennis Sun DATA 401 Data Science Alex Dekhtyar..

Machine Learning. Part 1. Linear Regression. Machine Learning: Regression Case. .. Dennis Sun DATA 401 Data Science Alex Dekhtyar.. .. Dennis Sun DATA 401 Data Science Alex Dekhtyar.. Machine Learning. Part 1. Linear Regression Machine Learning: Regression Case. Dataset. Consider a collection of features X = {X 1,...,X n }, such that

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: A brief introduction to Support Vector Machine Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Machine Learning 2: Nonlinear Regression

Machine Learning 2: Nonlinear Regression 15-884 Machine Learning : Nonlinear Regression J. Zico Kolter September 17, 01 1 Non-linear regression Peak Hourly Demand (GW).5 0 0 40 60 80 100 High temperature / peak demand observations for all days

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Kernel Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 21

More information

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX Term Test 3 December 5, 2003 Name Math 52 Student Number Direction: This test is worth 250 points and each problem worth 4 points DO ANY SIX PROBLEMS You are required to complete this test within 50 minutes

More information

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML) Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:

More information

(X i,y, i i ), i =1, 2,,n, Y i = min (Y i, C i ), i = I (Y i C i ) C i. Y i = min (C, Xi t θ 0 + i ) E [ρ α (Y t )] t ρ α (y) =(α I (y < 0))y I (A )

(X i,y, i i ), i =1, 2,,n, Y i = min (Y i, C i ), i = I (Y i C i ) C i. Y i = min (C, Xi t θ 0 + i ) E [ρ α (Y t )] t ρ α (y) =(α I (y < 0))y I (A ) Y F Y ( ) α q α q α = if y : F Y (y ) α. α E [ρ α (Y t )] t ρ α (y) =(α I (y < 0))y I (A ) Y 1,Y 2,,Y q α 1 Σ i =1 ρ α (Y i t ) t X = x α q α (x ) Y q α (x )= if y : P (Y y X = x ) α. q α (x ) θ α q α

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

UVA$CS$4501$+$001$/$6501$ $007$ Introduc8on$to$Machine$Learning$ and$data$mining$$ $ Lecture$19:!Decision!Tree!/! Random!Forest!/!

UVA$CS$4501$+$001$/$6501$ $007$ Introduc8on$to$Machine$Learning$ and$data$mining$$ $ Lecture$19:!Decision!Tree!/! Random!Forest!/! YanjunQi/UVACS4501L01L6501L07 UVA$CS$4501$+$001$/$6501$ $007$ Introduc8on$to$Machine$Learning$ and$data$mining$$ $ Lecture$19:DecisionTree/ RandomForest/Ensemble$ YanjunQi/Jane,,PhD UniversityofVirginia

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Machine Learning. Regression Case Part 1. Linear Regression. Machine Learning: Regression Case.

Machine Learning. Regression Case Part 1. Linear Regression. Machine Learning: Regression Case. .. Cal Poly CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning. Regression Case Part 1. Linear Regression Machine Learning: Regression Case. Dataset. Consider a collection of features X

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Linear Regression 1 / 25. Karl Stratos. June 18, 2018

Linear Regression 1 / 25. Karl Stratos. June 18, 2018 Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Stat 502X Exam 2 Spring 2014

Stat 502X Exam 2 Spring 2014 Stat 502X Exam 2 Spring 2014 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This exam consists of 12 parts. I'll score it at 10 points per problem/part

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output

More information

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so

More information

ESS2222. Lecture 3 Bias-Variance Trade-off

ESS2222. Lecture 3 Bias-Variance Trade-off ESS2222 Lecture 3 Bias-Variance Trade-off Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Bias-Variance Trade-off Overfitting & Regularization Ridge & Lasso Regression Nonlinear

More information

1. Kernel ridge regression In contrast to ordinary least squares which has a cost function. m (θ T x (i) y (i) ) 2, J(θ) = 1 2.

1. Kernel ridge regression In contrast to ordinary least squares which has a cost function. m (θ T x (i) y (i) ) 2, J(θ) = 1 2. CS229 Problem Set #2 Solutions 1 CS 229, Public Course Problem Set #2 Solutions: Theory Kernels, SVMs, and 1. Kernel ridge regression In contrast to ordinary least squares which has a cost function J(θ)

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

CS 453X: Class 1. Jacob Whitehill

CS 453X: Class 1. Jacob Whitehill CS 453X: Class 1 Jacob Whitehill How old are these people? Guess how old each person is at the time the photo was taken based on their face image. https://www.vision.ee.ethz.ch/en/publications/papers/articles/eth_biwi_01299.pdf

More information

1 Machine Learning Concepts (16 points)

1 Machine Learning Concepts (16 points) CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Normalising constants and maximum likelihood inference

Normalising constants and maximum likelihood inference Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising

More information

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines II CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Outline Linear SVM hard margin Linear SVM soft margin Non-linear SVM Application Linear Support Vector Machine An optimization

More information

Your first day at work MATH 806 (Fall 2015)

Your first day at work MATH 806 (Fall 2015) Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies

More information

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key Introduction to Econometrics Midterm Examination Fall 2005 Answer Key Please answer all of the questions and show your work Clearly indicate your final answer to each question If you think a question is

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Leaving The Span Manfred K. Warmuth and Vishy Vishwanathan

Leaving The Span Manfred K. Warmuth and Vishy Vishwanathan Leaving The Span Manfred K. Warmuth and Vishy Vishwanathan UCSC and NICTA Talk at NYAS Conference, 10-27-06 Thanks to Dima and Jun 1 Let s keep it simple Linear Regression 10 8 6 4 2 0 2 4 6 8 8 6 4 2

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

A Quick Tour of Linear Algebra and Optimization for Machine Learning

A Quick Tour of Linear Algebra and Optimization for Machine Learning A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 27, 2015 Outline Linear regression Ridge regression and Lasso Time complexity (closed form solution) Iterative Solvers Regression Input: training

More information

Weighted Least Squares I

Weighted Least Squares I Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 03: Multi-layer Perceptron Outline Failure of Perceptron Neural Network Backpropagation Universal Approximator 2 Outline Failure of Perceptron Neural Network Backpropagation

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

UVA CS 4501: Machine Learning. Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce. Dr. Yanjun Qi. University of Virginia

UVA CS 4501: Machine Learning. Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce. Dr. Yanjun Qi. University of Virginia UVA CS 4501: Machine Learning Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Support Vector Machines

Support Vector Machines Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information