Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1
|
|
- Walter Sharp
- 6 years ago
- Views:
Transcription
1 School of Computer Scence Introducton to Machne Learnng Lnear Regresson Readngs: Bshop, 3.1 Matt Gormle Lecture 5 September 14, 016 1
2 Homework : Remnders Extenson: due Frda (9/16) at 5:30pm Rectaton schedule posted on course webste
3 Outlne Lnear Regresson Smple example Model Learnng Gradent Descent SGD Closed Form Advanced opcs Geometrc and Probablstc Interpretaton of LMS L Regularzaton L1 Regularzaton Features 3
4 Outlne Lnear Regresson Smple example Model Learnng (aka. Least Squares) Gradent Descent SGD (aka. Least Mean Squares (LMS)) Closed Form (aka. Normal Equatons) Advanced opcs Geometrc and Probablstc Interpretaton of LMS L Regularzaton (aka. Rdge Regresson) L1 Regularzaton (aka. LASSO) Features (aka. non- lnear bass functons) 4
5 Lnear regresson Our goal s to estmate w from a tranng data of <x, > pars Y = wx + ε Optmzaton goal: mnmze squared error (least squares): arg mn w ( wx ) Wh least squares? - mnmzes squared dstance between measurements and predcted lne see HW - has a nce probablstc nterpretaton - the math s prett
6 Solvng lnear regresson o optmze closed form: We just take the dervatve w.r.t. to w and set to 0: w ( wx ) = x ( wx ) x ( wx ) = 0 w = x = wx x x x wx x = 0
7 Lnear regresson Gven an nput x we would lke to compute an output In lnear regresson we assume that and x are related wth the followng equaton: What we are trng to predct = wx+ε Observed values Y where w s a parameter and ε represents measurement or other nose
8 Regresson example Generated: w= Recovered: w=.03 Nose: std=1
9 Regresson example Generated: w= Recovered: w=.05 Nose: std=
10 Regresson example Generated: w= Recovered: w=.08 Nose: std=4
11 Bas term So far we assumed that the lne passes through the orgn What f the lne does not? No problem, smpl change the model to = w 0 + w 1 x+ε Can use least squares to determne w 0, w 1 w 0 Y w 0 = n w x 1 x ( w1 = x w 0 )
12 Lnear Regresson Data: Inputs are contnuous vectors of length K. Outputs are contnuous scalars. D = {x (), () } N =1 where x R K and R What are some example problems of ths form? 1
13 Lnear Regresson Data: Inputs are contnuous vectors of length K. Outputs are contnuous scalars. D = {x (), () } N =1 where x R K and R Predcton: Output s a lnear functon of the nputs. ŷ = h (x) = 1 x 1 + x K x K ŷ = h (x) = x (We assume x 1 s 1) Learnng: fnds the parameters that mnmze some objectve functon. = argmn J( ) 13
14 Least Squares Learnng: fnds the parameters that mnmze some objectve functon. = argmn J( ) We mnmze the sum of the squares: J( )= 1 N =1 ( x () () ) Wh? 1. Reduces dstance between true measurements and predcted hperplane (lne n 1D). Has a nce probablstc nterpretaton 14
15 Least Squares Learnng: fnds the parameters that mnmze some objectve functon. = argmn J( ) We mnmze the sum of the squares: hs s a ver general optmzaton J( )= setup. 1 We could solve t n N =1 ( x () () ) Wh? lots of was. oda, 1. Reduces dstance between true measurements and we ll consder three predcted hperplane (lne n 1D). Has a was. nce probablstc nterpretaton 15
16 Least Squares Learnng: hree approaches to solvng = argmn J( ) Approach 1: Gradent Descent (take larger more certan steps opposte the gradent) Approach : Stochastc Gradent Descent (SGD) (take man small steps opposte the gradent) Approach 3: Closed Form (set dervatves equal to zero and solve for parameters) 16
17 Gradent Descent Algorthm 1 Gradent Descent 1: procedure GD(D, : (0) (0) ) 3: whle not converged do 4: + J( ) 5: return In order to appl GD to Lnear Regresson all we need s the gradent of the objectve functon (.e. vector of partal dervatves). J( )= d d 1 J( ) d d J( ). d d N J( ) 17
18 Gradent Descent Algorthm 1 Gradent Descent 1: procedure GD(D, : (0) (0) ) 3: whle not converged do 4: + J( ) 5: return here are man possble was to detect convergence. For example, we could check whether the L norm of the gradent s below some small tolerance. J( ) Alternatvel we could check that the reducton n the objectve functon from one teraton to the next s small. 18
19 Stochastc Gradent Descent (SGD) Algorthm Stochastc Gradent Descent (SGD) 1: procedure SGD(D, : (0) (0) ) 3: whle not converged do 4: for shu e({1,,...,n}) do 5: + J () ( ) 6: return Appled to Lnear Regresson, SGD s called the Least Mean Squares (LMS) algorthm We need a per- example objectve: Let J( )= N =1 J () ( ) where J () ( )= 1 ( x () () ). 19
20 Stochastc Gradent Descent (SGD) Algorthm Stochastc Gradent Descent (SGD) 1: procedure SGD(D, : (0) (0) ) 3: whle not converged do 4: for shu e({1,,...,n}) do 5: for k {1,,...,K} do 6: k k + d d k J () ( ) 7: return Appled to Lnear Regresson, SGD s called the Least Mean Squares (LMS) algorthm We need a per- example objectve: Let J( )= N =1 J () ( ) where J () ( )= 1 ( x () () ). 0
21 Stochastc Gradent Descent (SGD) Algorthm Stochastc Gradent Descent (SGD) 1: procedure SGD(D, : (0) (0) ) 3: whle not converged do 4: for shu e({1,,...,n}) do 5: for k {1,,...,K} do 6: k k + d d k J () ( ) 7: return Appled to Lnear Regresson, SGD s called the Least Mean Squares (LMS) algorthm Let s start b calculatng We need a per- example objectve: ths partal dervatve for Let J( )= N =1 J () the Lnear Regresson ( ) objectve functon. where J () ( )= 1 ( x () () ). 1
22 Partal Dervatves for Lnear Reg. Let J( )= N =1 J () ( ) where J () ( )= 1 ( x () () ). d d k J () ( )= = 1 d 1 d k ( x () () ) d ( x () d k () ) =( x () () ) d d k ( x () () ) =( x () () ) d d k ( =( x () () )x () k K k=1 kx () k () )
23 Partal Dervatves for Lnear Reg. Let J( )= N =1 J () ( ) where J () ( )= 1 ( x () () ). d d k J () ( )=( x () d d k J( )= = d d k N =1 N =1 J () ( ) () )x () k ( x () () )x () k Used b SGD (aka. LMS) Used b Gradent Descent 3
24 Least Mean Squares (LMS) Algorthm 3 Least Mean Squares (LMS) 1: procedure LMS(D, : (0) (0) ) 3: whle not converged do 4: for shu e({1,,...,n}) do 5: for k {1,,...,K} do 6: k k + ( x () () )x () k 7: return Appled to Lnear Regresson, SGD s called the Least Mean Squares (LMS) algorthm 4
25 Optmzaton for Lnear Reg. vs. Logstc Reg. Can use the same trcks for both: regularzaton tunng learnng rate on development data shuffle examples out- of- core (f can t ft n memor) and stream over them local hll clmbng elds global optmum (both problems are convex) etc. But Logstc Regresson does not have a closed form soluton for MLE parameters what about Lnear Regresson? 5
26 Least Squares Learnng: hree approaches to solvng = argmn J( ) Approach 1: Gradent Descent (take larger more certan steps opposte the gradent) Approach : Stochastc Gradent Descent (SGD) (take man small steps opposte the gradent) Approach 3: Closed Form (set dervatves equal to zero and solve for parameters) 6
27 he normal equatons l Wrte the cost functon n matrx form: l o mnmze J(), take dervatve and set to zero: Erc CMU, ( ) ( ) ( ) J n!!!!!! + = = = = ) ( ) ( x = x n x x!!! 1 = n! " 1 ( ) ( ) ( ) = = + = + = + = J!!!!!!!!! tr tr tr tr! = he normal equatons ( )! 1 = *
28 Some matrx dervatves l For R m f : n! R, defne: l race: A tra = A11 f ( A) = " A1 n = 1 A, m f f! #! A 1n " A tra = a, mn f f tr ABC = trcab = trbca l Some fact of matrx dervatves (wthout proof) A trab = B, 1 traba C = CAB + C AB, A = A( A ) A A Erc CMU,
29 Comments on the normal equaton l In most stuatons of practcal nterest, the number of data ponts N s larger than the dmensonalt k of the nput space and the matrx s of full column rank. If ths condton holds, then t s eas to verf that s necessarl nvertble. l he assumpton that s nvertble mples that t s postve defnte, thus the crtcal pont we have found s a mnmum. l What f has less than full column rank? à regularzaton (later). Erc CMU,
30 Drect and Iteratve methods l Drect methods: we can acheve the soluton n a sngle step b solvng the normal equaton l l Usng Gaussan elmnaton or QR decomposton, we converge n a fnte number of steps It can be nfeasble when data are streamng n n real tme, or of ver large amount l Iteratve methods: stochastc or steepest gradent l l l Convergng n a lmtng sense But more attractve n large practcal problems Cauton s needed for decdng the learnng rate α Erc CMU,
31 Convergence rate l heorem: the steepest descent equaton algorthm converge to the mnmum of the cost characterzed b normal equaton: If l A formal analss of LMS need more math-mussels; n practce, one can use a small α, or graduall decrease α. Erc CMU,
32 Convergence Curves l For the batch method, the tranng MSE s ntall large due to unnformed ntalzaton l In the onlne update, N updates for ever epoch reduces MSE to a much smaller value. Erc CMU,
33 Least Squares Learnng: hree approaches to solvng = argmn J( ) Approach 1: Gradent Descent (take larger more certan steps opposte the gradent) pros: conceptuall smple, guaranteed convergence cons: batch, often slow to converge Approach : Stochastc Gradent Descent (SGD) (take man small steps opposte the gradent) pros: memor effcent, fast convergence, less prone to local optma cons: convergence n practce requres tunng and fancer varants Approach 3: Closed Form (set dervatves equal to zero and solve for parameters) pros: one shot algorthm! cons: does not scale to large datasets (matrx nverse s bottleneck) 33
34 Matchng Game Goal: Match the Algorthm to ts Update Rule 1. SGD for Logstc Regresson 4. h (x) = p( x). Least Mean Squares h (x) = 5. x 3. Perceptron (next lecture) h (x) = sgn( x) + (h (x() ) k k k k + k k 6. A. 1=5, =4, 3=6 B. 1=5, =6, 3=4 C. 1=6, =4, 3=4 D. 1=5, =6, 3=6 E. 1=6, =6, 3=6 () ) exp (h (x() ) + (h (x() ) () ) () () )xk 34
35 Geometrc Interpretaton of LMS he predctons on the tranng data are:! ( ) 1 ˆ! = * = Note that and! ˆ! ( 1! = I ) ( ) (!ˆ! ) = ( ) = ( ) 1! I ( ) 1 = 0!! ( ) ŷ!! s the orthogonal projecton of nto the space spanned b the columns of! =! 1 " n =! x x! 1 x n! Erc CMU,
36 Probablstc Interpretaton of LMS Let us assume that the target varable and the nputs are related b the equaton: where ε s an error term of unmodeled effects or random nose Now assume that ε follows a Gaussan N(0,σ), then we have: B ndependence assumpton: ε + = x = 1 σ πσ ) ( exp ) ; ( x p x = = = = σ πσ n n n x p L ) ( exp ) ; ( ) ( x 36 Erc CMU,
37 Probablstc Interpretaton of LMS, cont. Hence the log- lkelhood s: n l( ) = nlog = ( 1 x ) πσ σ Do ou recognze the last term? Yes t s: n 1 J ( ) = ( x ) = 1 hus under ndependence assumpton, LMS s equvalent to MLE of! Erc CMU,
38 Rdge Regresson Adds an L regularzer to Lnear Regresson J RR ( )=J( )+ = 1 N =1 ( x () () ) + K k=1 k Baesan nterpretaton: MAP estmaton wth a Gaussan pror on the parameters MAP = argmax N =1 = argmax J RR ( ) log p ( () x () ) + log p( ) where p( ) N (0, 1 ) 38
39 LASSO Adds an L1 regularzer to Lnear Regresson J LASSO ( )=J( )+ 1 = 1 N =1 ( x () () ) + K k=1 k Baesan nterpretaton: MAP estmaton wth a Laplace pror on the parameters MAP = argmax N =1 = argmax J LASSO ( ) log p ( () x () ) + log p( ) where p( ) Laplace(0,f( )) 39
40 Rdge Regresson vs Lasso Rdge Regresson: Lasso: βs wth constant J(β) (level sets of J(β)) βs wth constant l norm β βs wth constant l1 norm β1 Lasso (l1 penalt) results n sparse soluons vector wth more zero coordnates Good for hgh- dmensonal problems don t have to store all coordnates! Erc CMU,
41 Non-Lnear bass functon So far we onl used the observed values x 1,x, However, lnear regresson can be appled n the same wa to functons of these values Eg: to add a term w x 1 x add a new varable z=x 1 x so each example becomes: x 1, x,. z As long as these functons can be drectl computed from the observed values the parameters are stll lnear n the data and the problem remans a mult-varate lnear regresson problem = w + w x + + w x k k + ε
42 Non-lnear bass functons What tpe of functons can we use? A few common examples: - Polnomal: φ j (x) = x j for j=0 n - Gaussan: - Sgmod: φ j (x) = (x µ j ) σ j φ j (x) = 1 1+ exp( s j x) An functon of the nput values can be used. he soluton for the parameters of the regresson remans the same. - Logs: φ j (x) = log(x +1)
43 General lnear regresson problem Usng our new notatons for the bass functon lnear regresson can be wrtten as = n j= 0 w j φ j (x) Where φ j (x) can be ether x j for multvarate regresson or one of the non-lnear bass functons we defned and φ 0 (x)=1 for the ntercept term
44 An example: polnomal bass vectors on a small dataset From Bshop Ch 1
45 0 th Order Polnomal n=10
46 1 st Order Polnomal
47 3 rd Order Polnomal
48 9 th Order Polnomal
49 Over-fttng Root-Mean-Square (RMS) Error:
50 Polnomal Coeffcents
51 9 th Order Polnomal Data Set Sze:
52 Regularzaton Penalze large coeffcent values J, (w) = 1 # & % w j φ j (x ) ( $ j ' λ w
53 Regularzaton: +
54 Polnomal Coeffcents none exp(18) huge
55 Over Regularzaton:
Lecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationLinear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3.
School of Computer Science 10-701 Introduction to Machine Learning Linear Regression Readings: Bishop, 3.1 Murphy, 7 Matt Gormley Lecture 4 September 19, 2016 1 Homework 1: due 9/26/16 Project Proposal:
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More information15-381: Artificial Intelligence. Regression and cross validation
15-381: Artfcal Intellgence Regresson and cross valdaton Where e are Inputs Densty Estmator Probablty Inputs Classfer Predct category Inputs Regressor Predct real no. Today Lnear regresson Gven an nput
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationDiscriminative classifier: Logistic Regression. CS534-Machine Learning
Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationDiscriminative classifier: Logistic Regression. CS534-Machine Learning
Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationTopic 5: Non-Linear Regression
Topc 5: Non-Lnear Regresson The models we ve worked wth so far have been lnear n the parameters. They ve been of the form: y = Xβ + ε Many models based on economc theory are actually non-lnear n the parameters.
More informationSupport Vector Machines
/14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationJAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger
JAB Chan Long-tal clams development ASTIN - September 2005 B.Verder A. Klnger Outlne Chan Ladder : comments A frst soluton: Munch Chan Ladder JAB Chan Chan Ladder: Comments Black lne: average pad to ncurred
More informationMachine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso
Machne Learnng Data Mnng CS/CS/EE 155 Lecture 4: Regularzaton, Sparsty Lasso 1 Recap: Complete Ppelne S = {(x, y )} Tranng Data f (x, b) = T x b Model Class(es) L(a, b) = (a b) 2 Loss Functon,b L( y, f
More informationLecture 3: Dual problems and Kernels
Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM
More informationOPTIMISATION. Introduction Single Variable Unconstrained Optimisation Multivariable Unconstrained Optimisation Linear Programming
OPTIMIATION Introducton ngle Varable Unconstraned Optmsaton Multvarable Unconstraned Optmsaton Lnear Programmng Chapter Optmsaton /. Introducton In an engneerng analss, sometmes etremtes, ether mnmum or
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More informationn α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0
MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationCSE 546 Midterm Exam, Fall 2014(with Solution)
CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:
More informationThe conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above
The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationCSCI B609: Foundations of Data Science
CSCI B609: Foundatons of Data Scence Lecture 13/14: Gradent Descent, Boostng and Learnng from Experts Sldes at http://grgory.us/data-scence-class.html Grgory Yaroslavtsev http://grgory.us Constraned Convex
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationPolynomial Regression Models
LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance
More informationMultilayer neural networks
Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer
More informationNon-linear Canonical Correlation Analysis Using a RBF Network
ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 Non-lnear Canoncal Correlaton Analss Usng a RBF Network Sukhbnder Kumar, Elane
More informationSVMs: Duality and Kernel Trick. SVMs as quadratic programs
11/17/9 SVMs: Dualt and Kernel rck Machne Learnng - 161 Geoff Gordon MroslavDudík [[[partl ased on sldes of Zv-Bar Joseph] http://.cs.cmu.edu/~ggordon/161/ Novemer 18 9 SVMs as quadratc programs o optmzaton
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationp 1 c 2 + p 2 c 2 + p 3 c p m c 2
Where to put a faclty? Gven locatons p 1,..., p m n R n of m houses, want to choose a locaton c n R n for the fre staton. Want c to be as close as possble to all the house. We know how to measure dstance
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationβ0 + β1xi. You are interested in estimating the unknown parameters β
Revsed: v3 Ordnar Least Squares (OLS): Smple Lnear Regresson (SLR) Analtcs The SLR Setup Sample Statstcs Ordnar Least Squares (OLS): FOCs and SOCs Back to OLS and Sample Statstcs Predctons (and Resduals)
More informationParameter estimation class 5
Parameter estmaton class 5 Multple Ve Geometr Comp 9-89 Marc Pollefes Content Background: Projectve geometr (D, 3D), Parameter estmaton, Algortm evaluaton. Sngle Ve: Camera model, Calbraton, Sngle Ve Geometr.
More informationClassification learning II
Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon
More informationMulti-layer neural networks
Lecture 0 Mult-layer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Lnear regresson w Lnear unts f () Logstc regresson T T = w = p( y =, w) = g( w ) w z f () = p ( y = ) w d w d Gradent
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationChapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of
Chapter 7 Generalzed and Weghted Least Squares Estmaton The usual lnear regresson model assumes that all the random error components are dentcally and ndependently dstrbuted wth constant varance. When
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informatione i is a random error
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationReport on Image warping
Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationCS4495/6495 Introduction to Computer Vision. 3C-L3 Calibrating cameras
CS4495/6495 Introducton to Computer Vson 3C-L3 Calbratng cameras Fnally (last tme): Camera parameters Projecton equaton the cumulatve effect of all parameters: M (3x4) f s x ' 1 0 0 0 c R 0 I T 3 3 3 x1
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationFeb 14: Spatial analysis of data fields
Feb 4: Spatal analyss of data felds Mappng rregularly sampled data onto a regular grd Many analyss technques for geophyscal data requre the data be located at regular ntervals n space and/or tme. hs s
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationEvaluation of classifiers MLPs
Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationA Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009
A utoral on Data Reducton Lnear Dscrmnant Analss (LDA) hreen Elhaban and Al A Farag Unverst of Lousvlle, CVIP Lab eptember 009 Outlne LDA objectve Recall PCA No LDA LDA o Classes Counter eample LDA C Classes
More informationLogistic Classifier CISC 5800 Professor Daniel Leeds
lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationHydrological statistics. Hydrological statistics and extremes
5--0 Stochastc Hydrology Hydrologcal statstcs and extremes Marc F.P. Berkens Professor of Hydrology Faculty of Geoscences Hydrologcal statstcs Mostly concernes wth the statstcal analyss of hydrologcal
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationSVMs: Duality and Kernel Trick. SVMs as quadratic programs
/8/9 SVMs: Dualt and Kernel rck Machne Learnng - 6 Geoff Gordon MroslavDudík [[[partl ased on sldes of Zv-Bar Joseph] http://.cs.cmu.edu/~ggordon/6/ Novemer 8 9 SVMs as quadratc programs o optmzaton prolems:
More informationProbabilistic Classification: Bayes Classifiers. Lecture 6:
Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.
More informationThe Fundamental Theorem of Algebra. Objective To use the Fundamental Theorem of Algebra to solve polynomial equations with complex solutions
5-6 The Fundamental Theorem of Algebra Content Standards N.CN.7 Solve quadratc equatons wth real coeffcents that have comple solutons. N.CN.8 Etend polnomal denttes to the comple numbers. Also N.CN.9,
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs246.stanford.edu 2/19/18 Jure Leskovec, Stanford CS246: Mnng Massve Datasets, http://cs246.stanford.edu 2 Hgh dm. data Graph data Infnte
More informationKristin P. Bennett. Rensselaer Polytechnic Institute
Support Vector Machnes and Other Kernel Methods Krstn P. Bennett Mathematcal Scences Department Rensselaer Polytechnc Insttute Support Vector Machnes (SVM) A methodology for nference based on Statstcal
More informationChapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the
More informationLinear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the
Chapter 11 Student Lecture Notes 11-1 Lnear regresson Wenl lu Dept. Health statstcs School of publc health Tanjn medcal unversty 1 Regresson Models 1. Answer What Is the Relatonshp Between the Varables?.
More informationPattern Classification
Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher
More information9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov
9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar
More informationSupport Vector Machines
CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at
More informationIntroduction to Regression
Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes
More informationLogistic Regression Maximum Likelihood Estimation
Harvard-MIT Dvson of Health Scences and Technology HST.951J: Medcal Decson Support, Fall 2005 Instructors: Professor Lucla Ohno-Machado and Professor Staal Vnterbo 6.873/HST.951 Medcal Decson Support Fall
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More information