Online Classification: Perceptron and Winnow
|
|
- Janice Doyle
- 6 years ago
- Views:
Transcription
1 E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng settng that was dscussed brefly n the frst lecture. Unlke the batch settng we have studed so far, where one s gven a sample or batch of tranng data and the goal s to learn from ths data a model that can make accurate predctons n the future, n the onlne settng, learnng takes place n a sequence of trals: on each tral, the learner must make a predcton or take some acton, each of whch can potentally result n some loss, and the goal s to update the predcton/decson model at the end of each tral so as to mnmze the total loss ncurred over a sequence of such trals. Onlne learnng s relevant for a varety of problems, ncludng predcton problems e.g. forecastng the weather the next day and decson/allocaton problems e.g. nvestng n dfferent stocks or mutual funds. We wll start by consderng onlne supervsed learnng problems, where on each tral, the learner receves an nstance and must predct ts label, followng whch the true label s revealed and a correspondng loss ncurred; as noted above, the goal of the learner s to mnmze the total loss over a sequence of trals. We wll focus n ths lecture on onlne bnary classfcaton problems, and n the next lecture on onlne regresson problems. We wll then dscuss onlne learnng from experts, a framework that can be useful for both onlne supervsed learnng problems and onlne decson/allocaton problems; we wll analyze ths framework n some detal n a couple of lectures, and then wll conclude n the last lecture wth a bref dscusson of how onlne learnng algorthms and ther analyses can be transported back nto the batch settng. The basc onlne bnary classfcaton settng can be descrbed as follows: Onlne Bnary Classfcaton Receve nstance x t X Predct ŷ t {±1} Incur loss ly t, ŷ t The goal of a learnng algorthm n ths settng s to mnmze the total loss ncurred. Specfcally, let S x 1, y 1,..., x T, y T. Then the cumulatve loss of an algorthm A on the tral sequence S s gven by L l S[A] T ly t, ŷ t. 1 t1 The goal s to desgn algorthms wth small cumulatve loss on any tral sequence or any tral sequence satsfyng certan propertes; the analyss here s therefore worst-case, rather than probablstc as n the batch settng. For bnary classfcaton wth zero-one loss l 0-1, the cumulatve loss of an algorthm over a tral sequence S corresponds to the number of predcton mstakes made by the algorthm on ths sequence; bounds on the cumulatve zero-one loss S [A] are therefore termed mstake bounds. In the followng, we wll study two classcal algorthms for onlne bnary classfcaton, namely the perceptron and wnnow algorthms, and dscuss the mstake bounds that can be derved for them. 1
2 Onlne Classfcaton: Perceptron and Wnnow Perceptron In ts basc form, the perceptron algorthm apples to Eucldean nstance spaces X R n, and mantans a lnear classfer represented by a weght vector n such a space: Algorthm Perceptron Intal weght vector w 1 0 R n Receve nstance x t X R n Predct ŷ t sgnw t x t Incur loss l 0-1 y t, ŷ t w t1 w t y t x t w t1 w t Notce that the algorthm makes an update to ts model weght vector only when there s a mstake n ts predcton; onlne algorthms satsfyng ths property are sad to be conservatve. To get an ntutve feel for the algorthm, observe that f the true label y t on tral t s 1 and the algorthm predcts ŷ t 1, then t means w t x t < 0; n order to mprove the predcton on ths example, the algorthm must ncrease the value of ths dot product. Indeed, we have w t1 x t w t x t x t w t x t. Smlarly, t can be verfed that when y t 1 and the algorthm predcts ŷ t 1, the update has the effect of decreasng the value of the dot product. Thus the updates make sense ntutvely. More formally, one can prove the followng classcal mstake bound for the perceptron algorthm n the lnearly separable case: Theorem.1 Perceptron Convergence Theorem; Block, 196; Novkoff, 196. Let S x 1, y 1,..., x T, y T R n {±1} T. Let R max{ x t t [T ]} and let > 0. Then for any u R n such that y t u x t t [T ], S [Perceptron] R u. Proof. Denote S [Perceptron] k. Consder measurng the progress towards u or closeness to u on each tral n terms of w t u. For each tral t on whch there s a mstake, we have w t1 u w t u y t x t u. For all other trals t, we have w t1 u w t u 0. Therefore summng over t 1,..., T gves Notng that w 1 0 and usng Cauchy-Schwartz, we have w T 1 u w 1 u k. 3 k wt 1 u Now for each tral t on whch there s a mstake, wt 1 u. 4 w t1 w t y t w t x t x t 5 w t R snce y t w t x t 0 for a mstake tral. 6 For all other trals t, w t1 w t 0. Therefore summng over t 1,..., T and notng agan w 1 0, we get w T 1 kr. 7 Substtutng n Eq. 4 gves Squarng both sdes yelds the result. k kr u. 8
3 Onlne Classfcaton: Perceptron and Wnnow 3 One can also show the followng weaker mstake bound n the general non-separable case: Theorem. Freund and Schapre; Let S x 1, y 1,..., x T, y T R n {±1} T. Let R max{ x t t [T ]} and let > 0. Then for any u R n, S [Perceptron] R T u t1 yt u x t. Detals of the proof can be found n [1]. We conclude our dscusson of the perceptron algorthm by observng that the algorthm can be re-wrtten so as to use only dot products between nstances seen by the algorthm, whch facltates a natural extenson to a kernel-based varant for arbtrary nstance spaces X : Algorthm Kernel Perceptron Kernel functon K : X X R Receve nstance x t X t 1 Predct ŷ t sgn r1 α rkx r, x t Incur loss l 0-1 y t, ŷ t α t y t α t 0 A mstake bound smlar to that for the lnear perceptron algorthm can be shown n ths case too: Theorem.3 Kernel Perceptron Convergence Theorem. Let S x 1, y 1,..., x T, y T X {±1} T and let K : X X R be a kernel functon on X. Let R max{ Kx t, x t t [T ]} and let > 0. Then for any u X such that y t Ku, x t t [T ], We leave the proof detals as an exercse. S [PerceptronK] R Ku, u. 3 Wnnow The wnnow algorthm also mantans a lnear classfer n a Eucldean nstance space; n ths case, however, the updates to the weght vector are multplcatve rather than addtve: Algorthm Wnnow Learnng rate parameter η > 0 Intal weght vector w 1 1 n,..., 1 n Rn Receve nstance x t X R n Predct ŷ t sgnw t x t Incur loss l 0-1 y t, ŷ t [n]: w t1 wt expηyt x t w t1 w t Z t where Z t n j1 wt j expηyt x t j
4 4 Onlne Classfcaton: Perceptron and Wnnow Here too, one can observe that when a mstake s made on some tral t, the effect of the update s to move the dot product w t1 x t n the rght drecton compared to w t x t. Formally, we have the followng mstake bound for tral sequences that are lnearly separable by a non-negatve weght vector: Theorem 3.1. Let S x 1, y 1,..., x T, y T R n {±1} T. Let R max{ x t t [T ]} and let > 0. Then for any u R n such that u 0 [n] and y t u x t t [T ], S [Wnnowη] η u 1 ln u 1 ln n eηr e ηr. Moreover, f R, u 1, and are known, then one can select η to yeld R S [Wnnowη ] u 1 ln n. Proof. Denote S [Wnnowη] k, and let p u/ u 1 so that p n, where n s the probablty smplex n R n. Consder agan measurng the progress towards u or p on each tral; n ths case, we wll measure the dstance of w t from p n terms of the KL-dvergence, KLp w t n p ln p w. For each t tral t on whch there s a mstake, we have KLp w t KLp w t1 n w t1 p ln w t n ηy t x t p ln Z t n ηy t p x t 9 10 n p ln Z t 11 ηy t p x t ln Z t 1 η ln Z t. u 1 13 Now, Z t n wt x t eηyt. Notng that y t x t [ R, R ] for all, t, we can bound Z t as follows usng convexty of the mappng t e ηt : n 1 y Z t w t t x t /R 1 y t x t e ηr /R e ηr 14 e ηr n w t e ηr n y t w x t t 15 e ηr e e ηr y t w t x t 16 e ηr snce e ηr e ηr > 0, and y t w t x t 0 for mstake trals t. 17 Therefore, on each mstake tral t, we have On all other trals t, KLp w t KLp w t1 η e e ηr ln u KLp w t KLp w t Therefore summng over t 1,..., T, we have KLp w 1 KLp w T 1 η e e ηr k ln. u 1 0
5 Onlne Classfcaton: Perceptron and Wnnow 5 Now, Ths yelds the desred bound k KLp w 1 ln n. 1 KLp w T 1 0. η u 1 ln u 1 ln n eηr e ηr. 3 Now f R, u 1, and are known, then one can mnmze the rght hand sde above w.r.t. η; ths yelds η 1 R u 1 ln. 4 R R u 1 Wth ths choce of η, one gets k ln n, 5 g R u 1 where gɛ 1ɛ 1 ɛ ln1 ɛ ln1 ɛ note that /R u 1 1, snce y t u t x t u 1 R. One can show that gɛ ɛ /, whch when appled to the above yelds the desred result. 4 Comparson of the Two Algorthms To understand the relatve strengths of the two algorthms, consder the followng two examples, where k n: Example 1 Sparse target vector, dense nstances. Let u {0, 1} n wth at most k non-zero components, and let x t {±1} n t. Thus u 1 k, u k, R n, and R 1. Example Dense target vector, sparse nstances. Let u 1 R n, and let x t {0, 1} n t such that each x t has at most k non-zero components. Thus u 1 n, u n, R k, and R 1. In Example 1, the mstake bound we get for perceptron s nk, whle that for wnnow s k ln n. On the other hand, n Example, the mstake bound we get for perceptron s kn, whereas that for wnnow s n ln n. Thus, for a sparse target vector that depends on only a small number of relevant features, wnnow gves a better mstake bound; for dense target vectors and sparse nstances, perceptron has a better bound. 5 Next Lecture In the next lecture we wll see both addtve and multplcatve update algorthms for onlne regresson, and wll derve bounds on ther regret, whch measures the cumulatve loss of the algorthm wth respect to the best possble loss wthn some class of predctors. Acknowledgments. The proof of the mstake bound for wnnow s based on a proof descrbed by Sham Kakade and Ambuj Tewar n ther lecture notes for a course taught at TTI Chcago n Sprng 008. References [1] Yoav Freund and Robert E. Schapre. Large margn classfcaton usng the perceptron algorthm. Machne Learnng, 373:77 96, 1999.
COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More information1 The Mistake Bound Model
5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013
COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationExcess Error, Approximation Error, and Estimation Error
E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationWe present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.
CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationCS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016
CS 294-128: Algorthms and Uncertanty Lecture 14 Date: October 17, 2016 Instructor: Nkhl Bansal Scrbe: Antares Chen 1 Introducton In ths lecture, we revew results regardng follow the regularzed leader (FTRL.
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationLearning Theory: Lecture Notes
Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be
More information8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 493 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces you have studed thus far n the text are real vector spaces because the scalars
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationPHYS 705: Classical Mechanics. Newtonian Mechanics
1 PHYS 705: Classcal Mechancs Newtonan Mechancs Quck Revew of Newtonan Mechancs Basc Descrpton: -An dealzed pont partcle or a system of pont partcles n an nertal reference frame [Rgd bodes (ch. 5 later)]
More informationOnline Linear Regression using Burg Entropy
Onlne Lnear Regresson usng Burg Entropy Prateek Jan, Bran Kuls, and Inderjt Dhllon Techncal Report TR-07-08 Unversty of Texas at Austn Austn, TX 7872 February 4, 2007 Abstract We consder the problem of
More informationCSCI B609: Foundations of Data Science
CSCI B609: Foundatons of Data Scence Lecture 13/14: Gradent Descent, Boostng and Learnng from Experts Sldes at http://grgory.us/data-scence-class.html Grgory Yaroslavtsev http://grgory.us Constraned Convex
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationCS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015
CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationDiscriminative classifier: Logistic Regression. CS534-Machine Learning
Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationEPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski
EPR Paradox and the Physcal Meanng of an Experment n Quantum Mechancs Vesseln C Nonnsk vesselnnonnsk@verzonnet Abstract It s shown that there s one purely determnstc outcome when measurement s made on
More informationLecture 10: May 6, 2013
TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationfind (x): given element x, return the canonical element of the set containing x;
COS 43 Sprng, 009 Dsjont Set Unon Problem: Mantan a collecton of dsjont sets. Two operatons: fnd the set contanng a gven element; unte two sets nto one (destructvely). Approach: Canoncal element method:
More informationSTAT 3008 Applied Regression Analysis
STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More information( ) 1/ 2. ( P SO2 )( P O2 ) 1/ 2.
Chemstry 360 Dr. Jean M. Standard Problem Set 9 Solutons. The followng chemcal reacton converts sulfur doxde to sulfur troxde. SO ( g) + O ( g) SO 3 ( l). (a.) Wrte the expresson for K eq for ths reacton.
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationPHYS 705: Classical Mechanics. Calculus of Variations II
1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationTemperature. Chapter Heat Engine
Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationCSE 546 Midterm Exam, Fall 2014(with Solution)
CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationLecture 5 September 17, 2015
CS 229r: Algorthms for Bg Data Fall 205 Prof. Jelan Nelson Lecture 5 September 7, 205 Scrbe: Yakr Reshef Recap and overvew Last tme we dscussed the problem of norm estmaton for p-norms wth p > 2. We had
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationModule 3: Element Properties Lecture 1: Natural Coordinates
Module 3: Element Propertes Lecture : Natural Coordnates Natural coordnate system s bascally a local coordnate system whch allows the specfcaton of a pont wthn the element by a set of dmensonless numbers
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationInstance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification
Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationBasically, if you have a dummy dependent variable you will be estimating a probability.
ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy
More informationCOS 511: Theoretical Machine Learning
COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that
More information17 Support Vector Machines
17 We now dscuss an nfluental and effectve classfcaton algorthm called (SVMs). In addton to ther successes n many classfcaton problems, SVMs are responsble for ntroducng and/or popularzng several mportant
More information11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]
Algorthms Lecture 11: Tal Inequaltes [Fa 13] If you hold a cat by the tal you learn thngs you cannot learn any other way. Mark Twan 11 Tal Inequaltes The smple recursve structure of skp lsts made t relatvely
More informationSpin-rotation coupling of the angularly accelerated rigid body
Spn-rotaton couplng of the angularly accelerated rgd body Loua Hassan Elzen Basher Khartoum, Sudan. Postal code:11123 E-mal: louaelzen@gmal.com November 1, 2017 All Rghts Reserved. Abstract Ths paper s
More informationA be a probability space. A random vector
Statstcs 1: Probablty Theory II 8 1 JOINT AND MARGINAL DISTRIBUTIONS In Probablty Theory I we formulate the concept of a (real) random varable and descrbe the probablstc behavor of ths random varable by
More informationSome basic inequalities. Definition. Let V be a vector space over the complex numbers. An inner product is given by a function, V V C
Some basc nequaltes Defnton. Let V be a vector space over the complex numbers. An nner product s gven by a functon, V V C (x, y) x, y satsfyng the followng propertes (for all x V, y V and c C) (1) x +
More informationThe Experts/Multiplicative Weights Algorithm and Applications
Chapter 2 he Experts/Multplcatve Weghts Algorthm and Applcatons We turn to the problem of onlne learnng, and analyze a very powerful and versatle algorthm called the multplcatve weghts update algorthm.
More informationSemi-supervised Classification with Active Query Selection
Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples
More informationCHAPTER 17 Amortized Analysis
CHAPTER 7 Amortzed Analyss In an amortzed analyss, the tme requred to perform a sequence of data structure operatons s averaged over all the operatons performed. It can be used to show that the average
More informationComparative Studies of Law of Conservation of Energy. and Law Clusters of Conservation of Generalized Energy
Comparatve Studes of Law of Conservaton of Energy and Law Clusters of Conservaton of Generalzed Energy No.3 of Comparatve Physcs Seres Papers Fu Yuhua (CNOOC Research Insttute, E-mal:fuyh1945@sna.com)
More informationYong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )
Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationChapter 6. Supplemental Text Material
Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.
More informationOn some variants of Jensen s inequality
On some varants of Jensen s nequalty S S DRAGOMIR School of Communcatons & Informatcs, Vctora Unversty, Vc 800, Australa EMMA HUNT Department of Mathematcs, Unversty of Adelade, SA 5005, Adelade, Australa
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationCOMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS
More informationRegularized Discriminant Analysis for Face Recognition
1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths
More informationA Note on Bound for Jensen-Shannon Divergence by Jeffreys
OPEN ACCESS Conference Proceedngs Paper Entropy www.scforum.net/conference/ecea- A Note on Bound for Jensen-Shannon Dvergence by Jeffreys Takuya Yamano, * Department of Mathematcs and Physcs, Faculty of
More informationDiscriminative classifier: Logistic Regression. CS534-Machine Learning
Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s
More informationA new construction of 3-separable matrices via an improved decoding of Macula s construction
Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We
More informationAnnouncements EWA with ɛ-exploration (recap) Lecture 20: EXP3 Algorithm. EECS598: Prediction and Learning: It s Only a Game Fall 2013.
Lecture 0: EXP3 Algorthm 1 EECS598: Predcton and Learnng: It s Only a Game Fall 013 Prof. Jacob Abernethy Lecture 0: EXP3 Algorthm Scrbe: Zhhao Chen Announcements None 0.1 EWA wth ɛ-exploraton (recap)
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationVQ widely used in coding speech, image, and video
at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationLinear Regression Analysis: Terminology and Notation
ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented
More information