Open Problem: The landscape of the loss surfaces of multilayer networks
|
|
- Jemima Lloyd
- 6 years ago
- Views:
Transcription
1 JMLR: Workshop and Conference Proceedngs vol 4: 5, 5 8th Annual Conference on Learnng Theory Open Problem: The landscape of the loss surfaces of multlayer networks Anna Choromanska Courant Insttute of Mathematcal Scences, New York Unversty, New York ACHOROMA@CIMS.NYU.EDU Yann LeCun Courant Insttute of Mathematcal Scences, New York Unversty, and Facebook Research, New York Gérard Ben Arous Courant Insttute of Mathematcal Scences, New York Unversty, New York YANN@CS.NYU.EDU BENAROUS@CIMS.NYU.EDU Edtor: Under Revew for COLT 5 Abstract Deep learnng has enjoyed a resurgence of nterest n the last few years for such applcatons as mage and speech recognton, or natural language processng. The vast majorty of practcal applcatons of deep learnng focus on supervsed learnng, where the supervsed loss functon s mnmzed usng stochastc gradent descent. The propertes of ths hghly non-convex loss functon, such as ts landscape and the behavor of crtcal ponts (maxma, mnma, and saddle ponts), as well as the reason why large- and small-sze networks acheve radcally dfferent practcal performance, are however very poorly understood. It was only recently shown that new results n spn-glass theory potentally may provde an explanaton for these problems by establshng a connecton between the loss functon of the neural networks and the Hamltonan of the sphercal spn-glass models. The connecton between both models reles on a number of possbly unrealstc assumptons, yet the emprcal evdence suggests that the connecton may exst n real. The queston we pose s whether t s possble to drop some of these assumptons to establsh a stronger connecton between both models. Keywords: multlayer networks, deep learnng, sphercal spn-glass model, Hamltonan, nonconvex optmzaton. Introducton The vast majorty of practcal applcatons of deep learnng use mult-stage archtectures composed of alternated layers of lnear transformatons and max functons (most often Rectfed Lnear Unts, e.g. Nar and Hnton ()), and focus on supervsed learnng, where the loss functon that needs to be mnmzed s most often cross entropy or hnge loss. Several researchers expermentng wth larger networks had notced that, whle multlayer nets do have many local mnma, the result of multple experments consstently gve very smlar performance. Ths suggests that all those local mnma are more or less equvalent n terms of error. It was also prevously notced that the problem of tranng deep learnng systems resdes wth avodng saddle ponts and quckly breakng the symmetry by pckng sdes of saddle ponts and choosng a sutable attractor LeCun et al. (998); Saxe et al. (4); Dauphn et al. (4). Earler theoretcal analyss, convenently revewed n Dauphn et al. (4), suggest the exstence of a certan structure of crtcal ponts of random Gaussan error functons on hgh dmensonal contnuous spaces. They mply that crtcal ponts whose error s much hgher than the global mnmum are exponentally lkely to be saddle ponts wth many negatve and approxmate plateau drectons whereas all local mnma are lkely to have an error very close to that of the global mnmum. Ther work establshes a strong emprcal connecton between neural networks and the theory c 5 A. Choromanska, Y. LeCun & G. Ben Arous.
2 CHOROMANSKA LECUN BEN AROUS of random Gaussan felds by provdng expermental evdence that the cost functon of neural networks exhbts the same propertes as the Gaussan error functons on hgh dmensonal contnuous spaces. Nevertheless they provde no theoretcal justfcaton for the exstence of ths connecton.. The connecton between multlayer networks and spn-glass models We next dscuss the assumptons that were made n Choromanska et al. (5) to establsh a connecton between the loss functon of neural networks and the Hamltonan of the sphercal spn-glass models (for detaled explanatons see Choromanska et al. (5)). The assumptons are numbered and marked wth letter resp. p or u denotng whether the assumpton s resp. plausble,.e. t can be satsfed n practce or else t can be mposed on the network wthout sgnfcantly changng ts performance, or obvously unrealstc, e.g Ap denotes the frst assumpton, whch s plausble. It can be shown that the loss functon of a typcal multlayer network wth ReLUs can be expressed as a polynomal functon of the weghts n the network, whose degree s the number of layers, and whose number of monomals s the number of paths (denoted as Ψ) from nputs to outputs. As the weghts (or the nputs) vary, some of the monomals are swtched off and others become actvated. Consder a smple model of a fully-connected feed-forward neural network wth H hdden layers (n denotes the number of unts n the th hdden layer, where nput layer has ndex = and output layer has ndex = H), and havng a sngle output (consder bnary classfcaton problem). Let Λ = H Ψ, and we assume Λ Z +. Let X be the random nput of the th path of a network. Then the normalzed output of the network can be expressed as Y = Λ (H )/ Ψ X A = H k= w (k), where w (k) s the weght of the k th segment of the th path (ths segment connects layer (k ) wth layer k of the network), and A s a Bernoull random varable denotng whether the th path s actve (A = ) or not (A = ). Consder hnge loss L(w) = max(, Y t Y ), where Y t s a random varable correspondng to the true data labelng takng values or, and w denotes all network weghts. Recall that max operator s often modeled as Bernoull random varable takng values or. Denote ths random varable as M and ts expectaton as ρ. Therefore Ψ H L(w) = M( Y t Y ) = M + Λ (H )/ Z I w (k), () where Z = Y t X, and I = MA s a Bernoull random varable takng values or. Assume random varables I, I,..., I Ψ have the same probablty of success (Ap), and thus they have the same expectaton denoted as ρ. Also assumng that each X s a standard Gaussan random varable (Ap), t follows that Z s also a standard Gaussan random varable. For large-sze networks large number of network parameters are redundant Denl et al. (3) and can ether be learned from a very small set of unque parameters or not learned at all wth almost no loss n predcton accuracy. Assume that Λ s the maxmal number of non-redundant (unque) parameters (A3p), and that they are unformly dstrbuted on the graph of connectons of the network (A4p),.e. every H-length product of unque weghts appears n Equaton (the set of all products s {w w... w H } Λ,,..., H = ). Thus re-ndexng the terms gves L(w) = M + Λ (H )/ Λ,,..., H = = k= Z,,..., H I,,..., H w w... w H. Assumng (A5u) the ndependence of Z,,..., H and I,,..., H one obtans
3 OPEN PROBLEM: THE LANDSCAPE OF THE LOSS SURFACES OF MULTILAYER NETWORKS E M,I,I,...,I Ψ [L(w)] = ρ + ρ Λ Λ (H )/,,..., H = Z,,..., H w w... w H. It s also assumed that Z s are ndependent (A6u). Fnally, the sphercal assumpton (A7p) mposes that Λ Λ = w =. Note that the term n bold s a Hamltonan of the sphercal spn-glass model Auffnger et al. (). It was recently shown Auffnger et al. () that the Hamltonan of ths model has nterestng propertes when the sze of the model (Λ) goes to. We next lst these propertes along wth the possble nterpretaton for neural networks: () crtcal ponts form an ordered structure such that there exsts an energy barrer (a certan value of the Hamltonan) below whch wth overwhelmng probablty one can fnd only low-ndex crtcal ponts, most of whch are concentrated close to the barrer (ths would explan why n case of large networks recovered local mnma are typcally correspondng to the same test performance whch s not the case for small networks, () Recoverng the ground state,.e. global mnmum, takes exponentally long tme, () wth overwhelmng probablty one can fnd only hgh-ndex saddle ponts above energy E and there are exponentally many of those (ths would explan the mportance of saddle ponts n the optmzaton problem), (v) low-ndex crtcal ponts are geometrcally lyng closer to the ground state than hgh-ndex crtcal ponts (ths would explan why recoverng poor qualty local mnma, whch are far from the global mnmum, s more lkely for small-sze networks than for large-sze networks). Open problem: Is t possble to establsh a connecton between the loss functon of the neural networks and the Hamltonan of the sphercal spn-glass models under mlder assumptons? The central problem s to elmnate unrealstc assumptons of varable ndependence (A5-6u). Note that assumpton A5u mples that the actvaton mechansm of any path (for the th path t s denoted as I ) s ndependent of the nput data, whch clearly cannot be true. Smlarly, assumpton A6u mples all paths have ndependent nputs, whch cannot be true snce many paths share the same nput. Alternatvely, t would also be desred to fnd network archtectures for whch the connecton to spn-glass models can be establshed explctly wth only mld (plausble), f any, assumptons. References A. Auffnger, G. Ben Arous, and J. Cerny. Random matrces and complexty of spn glasses. arxv:3.9,. A. Choromanska, M. Henaff, M. Matheu, G. Ben Arous, and Y. LeCun. The loss surfaces of multlayer networks. In AISTATS, 5. Y. Dauphn, R. Pascanu, Ç. Gülçehre, K. Cho, S. Gangul, and Y. Bengo. Identfyng and attackng the saddle pont problem n hgh-dmensonal non-convex optmzaton. In NIPS. 4. M. Denl, B. Shakb, L. Dnh, M. Ranzato, and N. D. Fretas. Predctng parameters n deep learnng. In NIPS. 3. Y. LeCun, L. Bottou, G. Orr, and K. Muller. Effcent backprop. In Neural Networks: Trcks of the trade. Sprnger, 998. V. Nar and G. Hnton. Rectfed lnear unts mprove restrcted boltzmann machnes. In ICML,. A. M. Saxe, J. L. McClelland, and S. Gangul. Exact solutons to the nonlnear dynamcs of learnng n deep lnear neural networks. In ICLR. 4.. Index of L at w s the number of negatve egenvalues of the Hessan L at w. Local mnma have ndex. 3
4 CHOROMANSKA LECUN BEN AROUS Appendx A. Emprcal evdence 5 6 count 75 5 Lambda count 4 nhdden loss.8.9. loss Fgure : Dstrbutons of the scaled test losses for the spn-glass (left) and the neural network (rght) experments. In ths secton we brefly summarze a subset of results from Choromanska et al. (5) showng the smlarty between the loss functon of the neural networks and the Hamltonan of the sphercal spn-glass models. The spn glass model was smulated for Λ from 5 to 5, where for each value of Λ, the dstrbuton of mnma was obtaned by samplng ntal ponts on the unt sphere and performng stochastc gradent descent (SGD) to fnd a mnmum energy pont. The neural network model was smulated usng a scaled-down verson of MNIST, where each mage was downsampled to sze. networks were traned wth one hdden layer and nhdden = {5, 5,, 5, 5} hdden unts, each one startng from a random set of parameters sampled unformly wthn the unt cube. All networks were traned for epochs usng SGD wth learnng rate decay. The dstrbuton of the scaled test losses s compared n Fgure for both models. We see that for small values of Λ and nhdden, we obtan poor local mnma 3 on many experments. For larger values of Λ and nhdden, the varance of losses decreases, and the dstrbuton becomes ncreasngly concentrated around the energy barrer where local mnma have hgh qualty. Ths ndcates that () gettng stuck n poor local mnma s a major problem for smaller networks but becomes gradually of less mportance as the network sze ncreases, and () n case of larger networks recovered local mnma are typcally correspondng to the same test performance, whch s not the case for small networks. Appendx B. Sphercal spn-glass model Fgure captures exemplary plots of the dstrbutons of the mean number of crtcal ponts, local mnma and low-ndex saddle ponts. Clearly local mnma and low-ndex saddle ponts are located n the band ( ΛE (H), ΛE (H)), where ΛE (H) s the energy barrer and ΛE (H) corresponds to the ground state (global mnmum), whereas hgh-ndex saddle ponts can only be found above the energy barrer ΛE (H). Ths geometrc structure, f t s also true for multlayer neural networks, plays a crucal role n the optmzaton problem. The optmzer, e.g. SGD, often. To observe qualtatve dfferences n behavor for dfferent values of Λ (for spn-glass model) and nhdden (for neural network), t s necessary to rescale the loss values to make ther expected values approxmately equal. For spn-glasses, the expected value of the loss at crtcal ponts scales lnearly wth Λ, therefore the losses have to be dvded by Λ, whereas for neural networks, the expected value of the loss at crtcal ponts was emprcally found to scale wth nhdden accordng to power law E[L] e αnhddenβ (α and beta are coeffcents), therefore the losses were dvded by L/e αnhddenβ. 3. Almost all recovered solutons were local mnma wth ndex equal to (whle computng the ndex of solutons, all egenvalues less than. n magntude were set to ). 4
5 OPEN PROBLEM: THE LANDSCAPE OF THE LOSS SURFACES OF MULTILAYER NETWORKS easly avods the band of hgh-ndex crtcal ponts, whch have many negatve curvature drectons, and descends to the band of low-ndex crtcal ponts whch le closer to the global mnmum. Thus fndng bad-qualty soluton,.e. the one far away from the global mnmum, s hghly unlkely for large-sze networks (t s also confrmed by the expermental results n Fgure ). Furthermore, as shown n Fgure, low-ndex crtcal ponts are mostly concentrated close to the energy barrer ( peaked dstrbuton), whch would potentally explan why n case of large networks recovered local mnma are typcally correspondng to the same test performance whch s not the case for small networks. Mean number of crtcal ponts crtcal ponts (zoomed) x 5 3 Λ E Λ E nf 5 5 x crtcal ponts crtcal ponts (zoomed) x 8.5 k= k= k=.5 k=3 k=4 k=5 Λ E.5 Λ E nf x Fgure : Dstrbuton of the mean number of crtcal ponts, local mnma and low-ndex saddle ponts (orgnal and zoomed; k denotes the ndex). Parameters H and Λ were set to H = 3 and Λ =. Black lne: u = ΛE (H), red lne: u = ΛE (H). ΛE corresponds to ground state (global mnmum). Fgure must be read n color. 5
Lecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationThe equation of motion of a dynamical system is given by a set of differential equations. That is (1)
Dynamcal Systems Many engneerng and natural systems are dynamcal systems. For example a pendulum s a dynamcal system. State l The state of the dynamcal system specfes t condtons. For a pendulum n the absence
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationMultigradient for Neural Networks for Equalizers 1
Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationHopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen
Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationLecture 23: Artificial neural networks
Lecture 23: Artfcal neural networks Broad feld that has developed over the past 20 to 30 years Confluence of statstcal mechancs, appled math, bology and computers Orgnal motvaton: mathematcal modelng of
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationCSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing
CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationPolynomial Regression Models
LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationConvergence of random processes
DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationWhy feed-forward networks are in a bad shape
Why feed-forward networks are n a bad shape Patrck van der Smagt, Gerd Hrznger Insttute of Robotcs and System Dynamcs German Aerospace Center (DLR Oberpfaffenhofen) 82230 Wesslng, GERMANY emal smagt@dlr.de
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationStatistics for Economics & Business
Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable
More informationKernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan
Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems
More informationVapnik-Chervonenkis theory
Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationTemperature. Chapter Heat Engine
Chapter 3 Temperature In prevous chapters of these notes we ntroduced the Prncple of Maxmum ntropy as a technque for estmatng probablty dstrbutons consstent wth constrants. In Chapter 9 we dscussed the
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationLOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi
LOGIT ANALYSIS A.K. VASISHT Indan Agrcultural Statstcs Research Insttute, Lbrary Avenue, New Delh-0 02 amtvassht@asr.res.n. Introducton In dummy regresson varable models, t s assumed mplctly that the dependent
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationSolving Nonlinear Differential Equations by a Neural Network Method
Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationOn the correction of the h-index for career length
1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat
More informationCHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD
CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB
More informationUsing deep belief network modelling to characterize differences in brain morphometry in schizophrenia
Usng deep belef network modellng to characterze dfferences n bran morphometry n schzophrena Walter H. L. Pnaya * a ; Ary Gadelha b ; Orla M. Doyle c ; Crstano Noto b ; André Zugman d ; Qurno Cordero b,
More informationPattern Classification
Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationThe Study of Teaching-learning-based Optimization Algorithm
Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute
More informationSupport Vector Machines
CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at
More informationLecture 10: May 6, 2013
TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,
More informationLecture 4: Constant Time SVD Approximation
Spectral Algorthms and Representatons eb. 17, Mar. 3 and 8, 005 Lecture 4: Constant Tme SVD Approxmaton Lecturer: Santosh Vempala Scrbe: Jangzhuo Chen Ths topc conssts of three lectures 0/17, 03/03, 03/08),
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationGames of Threats. Elon Kohlberg Abraham Neyman. Working Paper
Games of Threats Elon Kohlberg Abraham Neyman Workng Paper 18-023 Games of Threats Elon Kohlberg Harvard Busness School Abraham Neyman The Hebrew Unversty of Jerusalem Workng Paper 18-023 Copyrght 2017
More informationHopfield Training Rules 1 N
Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More information4.3 Poisson Regression
of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationEcon Statistical Properties of the OLS estimator. Sanjaya DeSilva
Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate
More informationThe Minimum Universal Cost Flow in an Infeasible Flow Network
Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran
More informationThe Second Anti-Mathima on Game Theory
The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player
More informationCS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015
CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research
More informationCONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION
CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING N. Phanthuna 1,2, F. Cheevasuvt 2 and S. Chtwong 2 1 Department of Electrcal Engneerng, Faculty of Engneerng Rajamangala
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationSimulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests
Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We
More informationCOMPUTATIONALLY EFFICIENT WAVELET AFFINE INVARIANT FUNCTIONS FOR SHAPE RECOGNITION. Erdem Bala, Dept. of Electrical and Computer Engineering,
COMPUTATIONALLY EFFICIENT WAVELET AFFINE INVARIANT FUNCTIONS FOR SHAPE RECOGNITION Erdem Bala, Dept. of Electrcal and Computer Engneerng, Unversty of Delaware, 40 Evans Hall, Newar, DE, 976 A. Ens Cetn,
More informationLecture 3 Stat102, Spring 2007
Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture
More informationCSE 252C: Computer Vision III
CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel
More informationDO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes
25/6 Canddates Only January Examnatons 26 Student Number: Desk Number:...... DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR Department Module Code Module Ttle Exam Duraton
More informationFinding Primitive Roots Pseudo-Deterministically
Electronc Colloquum on Computatonal Complexty, Report No 207 (205) Fndng Prmtve Roots Pseudo-Determnstcally Ofer Grossman December 22, 205 Abstract Pseudo-determnstc algorthms are randomzed search algorthms
More information2016 Wiley. Study Session 2: Ethical and Professional Standards Application
6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM
ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look
More informationFinding Dense Subgraphs in G(n, 1/2)
Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng
More information12. The Hamilton-Jacobi Equation Michael Fowler
1. The Hamlton-Jacob Equaton Mchael Fowler Back to Confguraton Space We ve establshed that the acton, regarded as a functon of ts coordnate endponts and tme, satsfes ( ) ( ) S q, t / t+ H qpt,, = 0, and
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationMin Cut, Fast Cut, Polynomial Identities
Randomzed Algorthms, Summer 016 Mn Cut, Fast Cut, Polynomal Identtes Instructor: Thomas Kesselhem and Kurt Mehlhorn 1 Mn Cuts n Graphs Lecture (5 pages) Throughout ths secton, G = (V, E) s a mult-graph.
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationAGC Introduction
. Introducton AGC 3 The prmary controller response to a load/generaton mbalance results n generaton adjustment so as to mantan load/generaton balance. However, due to droop, t also results n a non-zero
More information