Generalized Linear Methods
|
|
- Brandon Porter
- 5 years ago
- Views:
Transcription
1 Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set of weak learners, {g t (x)} T,.e. we have traned these learners on the tranng data {(x, y )} n =1, whch correspond to the weghts {w } n =1. We can create a strong model, as a combnaton of all these weak learners va creatng a lnear ( combnaton of them. We defne T ) the weghted output as G T (x) = sgn α tg t (x), where the sgn(.) s the sgn functon. One of the most famous method for such boostng based data s called AdaBoost. Based on AdaBoost one could develop methods for gradent boostng, especally for trees. Here we frst explan the AdaBoost. 2 AdaBoost Manly ntroduced n [1], s consdered as one of the most mportant steps n the Statstcal Learnng. One could show that AdaBoost s mnmzng an upper bound on the emprcal error. To be accurate, the emprcal error (for a classfcaton) s defned as the followng: ˆP (Y G T (X) 0) = 1 n { : G(x ) y } whch can be upper-bounded by the multplcaton of the normalzaton constants: ˆP (Y G T (X) 0) Z t. Or Boostng Technques, Ensemble Methods,... 1
2 Algorthm 1: Adaboost. Input: The set of weak learners, {g t (x)} T Output: The weghts of the generalzed learner {α t } T Intalze the weghts, w (1) = 1 n, = 1,..., n for t = 1 to T do Ft the learner g t (x) to the tranng data {(x, y )} n =1, weghted { by } n =1. Choose α t R; w (t+1) normalzaton factor. Return the output G(x). exp[ α ty g t(x )] Z t, for = 1,..., n, where Z t s a One unanswered queston by now s that, how do we choose α t? One opton s to choose t n a greedy fashon to make Z t (α) at each step. Snce Z t (α) s a convex functon, t has a unque mnmum. Gven that g t (x) {±1}, the greedy choce of α t would gve the followng answer: ɛ t I (y g t (x )) (1) α t 1 2 log 1 ɛ t ɛ t (2) Wth ths choce we can fnd a guarantee on the emprcal error bound of ths algorthm, as stated by the next algorthm. Theorem 1. The gven procedure n Algorthm 1, wth the choce of α t n the Equaton 1, f ɛ t 1 2 for all t wll result n the followng bound: ˆP (Y G T (X) 0) δ (3) for any arbtrary > 0, f T s bg enough. More accurately f T ln 1/δ 2 2. Now we wll provde the proof of Theorem 1 to the next secton. If you don t feel lke readng a relatvely borng proof, you can skp t to the next secton! We chunk the proof nto smaller sectons. Frst we prove the followng lemma. 2
3 Lemma 1. The gven procedure n Algorthm 1, wth the choce of α t n Equaton 1, wll result n the followng bounds: ˆP (Y G T (X) 0) = 1 n { : G(x ) y } Z t. (4) Proof. The equvalent event to Y G T (X) 0 s exp ( Y G T (X)) 1. It s easy to see that: ˆP (Y G T (X) 0) = 1 n I {G(x ) y =1 } (5) n Usng the updates of w (t+1) whch would gve: 1 n 1 n 1 n =1 n exp ( y G(x )) (6) =1 ( n exp y =1 n =1 T α t g t (x ) ) (7) exp ( y α t g t (x )) (8) n the Algorthm 2, we have: w (t+1) exp ( y α t g t (x )) = Z t n ˆP (Y G T (X) 0) 1 n =1 ( T ) = Z t w (t+1) Z t n 1 n =1 (T +1) w w (1) (9) (10) Snce we chose w (1) (T +1) = 1/n, and w s a proper probablty dstrbuton, ( T n (T +1) ˆP (Y G T (X) 0) Z t) w (11) =1 Z t. (12) 3
4 Lemma 2. The gven procedure n Algorthm 1, wth the choce of α t n Equaton 1, the greedy choce of α t to mnmze Z t wll result n, and where, ɛ t α t 1 2 log 1 ɛ t ɛ t Z t = 2 ɛ t (1 ɛ t ) I (y g t (x )) Proof. The proof s easy. constant explctly: Let s wrte the defnton of the normalzaton Z t = e αty g t(x ) = = (1 ɛ t )e αt + ɛ t e αt :y =g t(x ) e αt + :y g t(x ) e αt Z t s a convex functon of α t and has unque mnmum. By talkng dervatve we can fnd the mnmzer, whch s the followng: α t = 1 2 log 1 ɛ t ɛ t By pluggng ths nto the defnton of Z t, we wll fnd ts mnmum value: Z t = 2 ɛ t (1 ɛ t ) Lemma 3. The results of Lemma 1 and Lemma 2, and gven that ɛ t 1 2, t, the bound on the emprcal error s not more than, ( 1 4 2) T/2 (where (0, 0.5)). Proof. By the end of Lemma 1 and Lemma 2, we have proven that: ˆP (Y G T (X) 0) 2 ɛ t (1 ɛ t ) Gven that (0, 0.5), we can show that, ˆP (Y G T (X) 0) ( 1 4 2) T/2 4
5 Now we have everythng we needed for the proof of the Theorem 1. Proof of the Theorem 1. Gven the result of the Lemma 3, we have ˆP (Y G T (X) 0) ( 1 4 2) T/2 Now defne whch s equvalent to δ ( 1 4 2) T/2 T ln 1/δ AdaBoost as mnmzng a global objectve functon One alternate vew to what mentoned above s mnmzng a global objectve functon, wth coordnate-descent (greedy) updates. It can be shown that the global objectve s the followng: L = 1 n e y t αtft(x ) when optmzng locally wth respect to α t. 2.2 More general AdaBoost As shown prevously, the standard form of AdaBoost can be nterpreted as mnmzng an exponental loss functon. One can show a general form of AdaBoost, on arbtrary loss functon, but due to computatonal cost these general forms are not very popular. 3 Gradent Boostng Usng the boostng methods, new methods are proposed for Gradent Boostng methods. In fact, the gradent boostng methods, are usng both of boostng and gradent methods, especally gradent descent (n the functonal level). Usng only gradent methods have cones, e.g. negatve effect on generalzaton error and local optmzaton. However, n glm (a package n R language) suggests selectng a class of functons that uses the covarate nformaton to approxmate the gradent, usually a regresson tree. The 5
6 algorthm s shown n the Algorthm 3. At each teraton the algorthm determnes the gradent, the drecton n whch t needs to mprove the approxmaton to ft to the data, by selectng from a class of functons. In other words, t selects a functon whch has the most agreement wth the current approxmaton error. Just to remnd you what problem s formally, we want to fnd a functon F such that: G (x) = arg mn E x,y [L(y, F (x))] = arg mn L(y, F (x))p(x, y)dxdy G G x,y But n practce the true dstrbuton P(x, y) s not known, and nstead we have samples of t, n the form of D = {(x, y )} n =1. Also the set of functons we can choose from s also lmted, whch we represent wth G. Thus the problem s approxmated n the followng form: G (x) arg mn L(y, F (x )) G G (x,y ) D Suppose a good approxmaton could be wrtten as a lnear combnaton of some coarse approxmatons: T α t g t (x) Suppose the followng s our ntal approxmaton, wth a constant functon: G 0 (x) = arg mn L(y, α) α (x,y ) D Followed by the ncremental approxmatons: G m (x) = G m 1 (x) + arg mn L(y, G m (x ) + g(x )) g G (x,y ) D Snce the mnmzaton n the prevous equaton s done over functons (functonal mnmzaton) t s relatvely hard to solve. Instead we can approxmate t wth greedy (functonal-)gradent based updates. The negatve functonal-gradent of the loss functon: g L(y, G(x) + g(x)) 6
7 s the drecton n whch loss functons has the most decrease. Thus followng updates, approprate choce of step sze wll result n reducton: G m (x) = G m 1 (x) m g L(y, G m 1 (x )) x D One possble way to fnd the step sze s va lne search: m = arg mn L (y, G m 1 (x ) g L(y, G m 1 (x ))) x D Algorthm 2: Gradent boostng algorthm. Input: The set of weak learners, {g t (x)} T Output: The weghts of the generalzed learner {α t } T Intalze an approxmaton wth a constant g 0 (x) = arg mn L(y, ). for m = 1 to M do Compute the negatve gradent, r m = ( r 1m, r 2m,..., r nm ), such that r m = L(y, G(x )) G(x ) G(x)=Gm 1 (x) Ft the a functon h m (x) on the gradent resduals {(x, r m )} n =1. Fnd the scalng parameter m such that the followng objectve s mnmzed: m = arg mn L (y, G m 1 (x ) + h m (x)) x D Update, G m (x) = G m 1 (x) + m h m (x) Return the output G M (x). Suppose we want to generalze the Algorthm 2 to trees. The only dfference s that, the approxmatng functon h m (x) s made of a tree, whch we can represent wth j b ji (x R j ), where I(.) s an ndcator functon whch shows whether the nput x belongs to a specfc regon R j or not, and b j s the predcton for the values n ths regon. In the followng step, a value of m s estmated va lne search. In the [2] t s suggested to use value for each regon. In other words change m = arg mn L (y, G m 1 (x ) + h m (x)) x D 7
8 to jm = arg mn x R jm L (y, G m 1 (x ) + b j ). Note that, n ths lne search, the values of {b j } M j=1 do not have any effect n the fnal result; the only thngs that matter are the set of regons {R j } M j=1. Thus we can smplfy t, and wrte t as the followng: jm = arg mn L (y, G m 1 (x ) + ). x R jm The overall algorthm s shown n Algorthm 3. Algorthm 3: Gradent boostng for trees. Input: The set of weak learners, {g t (x)} T Output: The weghts of the generalzed learner {α t } T Intalze a sngle-node tree G 0 (x) = arg mn L(y, ). for m = 1 to M do Compute the negatve gradent, r m = ( r 1m, r 2m,..., r nm ), such that r m = L(y, G(x )) G(x ) G(x)=Gm 1 (x) Ft the regresson to the pseudo-responses (resduals) {(x, r m )} n =1, whch results n termnal regons, R jm, j = 1,..., J m. for j = 1, 2, 3,..., J m do jm = arg mn x R jm L (y, G m 1 (x ) + ) Update, G m (x) = G m 1 (x) + j jmi (x R jm ) Return the output G M (x). In the glm lbrary of R, gven the scenaro shown n the Algorthm 3, the shrnkage parameter s the (or learnng rate) parameter n gradent updates. So the man effort n varable selecton s n selectng are the choce of n.trees and shrnkage parameters Regularzaton Snce the gradent boostng for trees has a bg degree of freedom, t s hghly prone to overfttng. One possble way to reduce the amount of overfttng s 8
9 shrnkage n the coeffcents. Suppose we have a parameter λ (0, 1), such that: G m (x) = G m 1 (x) + λ m h m (x) 4 Fnal notes Some ntuton s from Davd Forsyth s class at UIUC. Peter Bartlett s class notes provded a very good summary of the man ponts. References [1] Yoav Freund and Robert E Schapre. A descon-theoretc generalzaton of on-lne learnng and an applcaton to boostng. In Computatonal learnng theory, pages Sprnger, [2] J. Fredman, T. Haste, and R. Tbshran. The elements of statstcal learnng,
Ensemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationCSE 546 Midterm Exam, Fall 2014(with Solution)
CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationWe present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.
CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationPHYS 705: Classical Mechanics. Calculus of Variations II
1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationOnline Classification: Perceptron and Winnow
E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationExcess Error, Approximation Error, and Estimation Error
E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple
More information1 The Mistake Bound Model
5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationEstimation: Part 2. Chapter GREG estimation
Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationLearning Theory: Lecture Notes
Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More information4DVAR, according to the name, is a four-dimensional variational method.
4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More information1 Definition of Rademacher Complexity
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the
More informationBezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0
Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the
More informationCOS 511: Theoretical Machine Learning
COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that
More informationLaboratory 1c: Method of Least Squares
Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationMAXIMUM A POSTERIORI TRANSDUCTION
MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,
More informationVector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.
Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm
More informationTHE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.
THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY Wllam A. Pearlman 2002 References: S. Armoto - IEEE Trans. Inform. Thy., Jan. 1972 R. Blahut - IEEE Trans. Inform. Thy., July 1972 Recall
More informationInstance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification
Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationIntroduction to Regression
Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationAdmin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester
0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationLaboratory 3: Method of Least Squares
Laboratory 3: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth
More informationAnalytical Chemistry Calibration Curve Handout
I. Quck-and Drty Excel Tutoral Analytcal Chemstry Calbraton Curve Handout For those of you wth lttle experence wth Excel, I ve provded some key technques that should help you use the program both for problem
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More information1 Matrix representations of canonical matrices
1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:
More informationEconomics 130. Lecture 4 Simple Linear Regression Continued
Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do
More informationMultilayer neural networks
Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer
More informationSociété de Calcul Mathématique SA
Socété de Calcul Mathématque SA Outls d'ade à la décson Tools for decson help Probablstc Studes: Normalzng the Hstograms Bernard Beauzamy December, 202 I. General constructon of the hstogram Any probablstc
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationManning & Schuetze, FSNLP (c)1999, 2001
page 589 16.2 Maxmum Entropy Modelng 589 Mannng & Schuetze, FSNLP (c)1999, 2001 a decson tree that detects spam. Fndng the rght features s paramount for ths task, so desgn your feature set carefully. Exercse
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More informationComputational and Statistical Learning theory Assignment 4
Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs246.stanford.edu 2/19/18 Jure Leskovec, Stanford CS246: Mnng Massve Datasets, http://cs246.stanford.edu 2 Hgh dm. data Graph data Infnte
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationOutline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique
Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationFREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,
FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then
More informationLecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.
prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove
More informationInterval Estimation in the Classical Normal Linear Regression Model. 1. Introduction
ECONOMICS 35* -- NOTE 7 ECON 35* -- NOTE 7 Interval Estmaton n the Classcal Normal Lnear Regresson Model Ths note outlnes the basc elements of nterval estmaton n the Classcal Normal Lnear Regresson Model
More informationRadial-Basis Function Networks
Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationLogistic Regression Maximum Likelihood Estimation
Harvard-MIT Dvson of Health Scences and Technology HST.951J: Medcal Decson Support, Fall 2005 Instructors: Professor Lucla Ohno-Machado and Professor Staal Vnterbo 6.873/HST.951 Medcal Decson Support Fall
More informationMean Field / Variational Approximations
Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods Introducton roblem: We have dstrbuton but
More informationLecture 2: Prelude to the big shrink
Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More informationReport on Image warping
Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationLinear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.
Lnear, affne, and convex sets and hulls In the sequel, unless otherwse specfed, X wll denote a real vector space. Lnes and segments. Gven two ponts x, y X, we defne xy = {x + t(y x) : t R} = {(1 t)x +
More information