A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
|
|
- Octavia York
- 6 years ago
- Views:
Transcription
1 A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur, srinah, mcallester, Abstract We present a generalization boun for feeforwar neural networks in terms of the prouct of the spectral norm of the layers an the Frobenius norm of the weights The generalization boun is erive using a PAC-Bayes analysis 1 Introuction In this note we present an prove a margin base generalization boun for feeforwar neural networks, that epens on the prouct of the spectral norm of the weights in each layer, as well as the Frobenius norm of the weights Our generalization boun shares much similarity with a margin base generalization boun recently presente by Bartlett et al [1] Both bouns epen similarly on the prouct of the spectral norms of each layer, multiplie by a factor that is aitive across layers In aition, Bartlett et al s [1] boun epens on the elementwise l 1 -norm of the weights in each layer, while our boun epens on the Frobenius elementwise l norm of the weights in each layer, with an aitional multiplicative epenence on the with The two bouns are thus not irectly comparable, an each one ominates in a ifferent regime, roughly epening on the sparsity of the weights More importantly, our proof technique is entirely ifferent, an arguably simpler, than that of Bartlett et al [1] We erive our boun using PAC-Bayes analysis, an more specifically a generic PAC-Bayes margin boun Lemma 1 The main ingreient is a perturbation boun Lemma, bouning the changes in the output of a network when the weights are perturbe, in terms of the prouct of the spectral norm of the layers This is an entirely ifferent analysis approach from Bartlett et al s [1] covering number analysis We hope our analysis can give more irect intuition into the ifferent ingreients in the boun an will allow moifying the analysis, eg by using ifferent prior an perturbation istributions in the PAC-Bayes boun, to obtain tighter bouns, perhaps with epenence on ifferent layer-wise norms We note that prior bouns in terms of elementwise or unit-wise norms such as the Frobenius norm an elementwise l 1 norms of layers, without a spectral norm epenence, all have a multiplicative epenence across layers or exponential epenence on epth Bartlett an Menelson [3], Neyshabur et al [11], or are for constant epth networks Bartlett [] Here only the spectral norm is multiplie across layers, an thus if the spectral norms are close to one, the exponential epenence on epth can be avoie 11 Preliminaries Consier the classification task with input omain X B,n = { x R n n x i B} an output omain R k where the output of the moel is a score for each class an the class with the maximum score will be selecte as the preicte label Let f w x : X B,n R k be the function compute
2 by a layer fee-forwar network for the classification task with parameters w = vec {W i }, f w x = W φw 1 φφw 1 x, here φ is the ReLU activation function Let fwx i enote the output of layer i before activation an h be an upper boun on the number of output units in each layer We can then efine fully connecte feeforwar networks recursively: fwx 1 = W 1 x an fwx i = W i φfw i 1 x Let F, 1 an enote the Frobenius norm, the element-wise l 1 norm an the spectral norm respectively We further enote the l p norm of a vector by p For any istribution D an margin γ > 0, we efine the expecte margin loss as follows: ] L γ f w = P x,y D [f w x[y] γ + max f wx[j] j y Let L γ f w be the empirical estimate of the above expecte margin loss Since setting γ = 0 correspons to the classification loss, we will use L 0 f w an L 0 f w to refer to the expecte risk an the training error The loss L γ efine this way is boune between 0 an 1 1 PAC-Bayesian framework The PAC-Bayesian framework [9, 10] provies generalization guarantees for ranomize preictors, rawn form a learne istribution Q as oppose to a learne single preictor that epens on the training ata In particular, let f w be any preictor not necessarily a neural network learne from the training ata an parametrize by w We consier the istribution Q over preictors of the form f w+u, where u is a ranom variable whose istribution may also epen on the training ata Given a prior istribution P over the set of preictors that is inepenent of the training ata, the PAC-Bayes theorem states that with probability at least 1 δ over the raw of the training ata, the expecte error of f w+u can be boune as follows [8]: KL w + u P + ln m δ E u [L 0 f w+u ] E u [ L 0 f w+u ] + m 1 To get a boun on the expecte risk L 0 f w for a single preictor f w, we nee to relate the expecte perturbe loss, E u [L 0 f w+u ] in the above equation with L 0 f w Towar this we use the following lemma that gives a margin-base generalization boun erive from the PAC-Bayesian boun : Lemma 1 Let f w x : X R k be any preictor not necessarily a neural network with parameters w, an P be any istribution on the parameters that is inepenent of the training ata Then, for any γ, δ > 0, with probability [ 1 δ over the training set of size m, for any w, an any ranom perturbation u st P u maxx X f w+u x f w x < γ ] 4 1, we have: L 0 f w L γ f w + 4 KL w + u P + ln 6m δ m 1 In the above expression the KL is evaluate for a fixe w an only u is ranom, ie the istribution of w + u is the istribution of u shifte by w The lemma is analogous to similar analysis of Langfor an Shawe-Taylor [7] an McAllester [8] obtaining PAC-Bayes margin bouns for linear preictors As we state the lemma, it is not specific to linear separators, nor neural networks, an hols generally for any real-value preictor We next show how to utilize the above general PAC-Bayes boun to prove generalization guarantees for feeforwar networks base on the spectral norm of its layers Generalization Boun In this section we present our generalization boun for feefowar networks with ReLU activations, erive using the PAC-Bayesian framework Langfor an Caruana [6], an more recently Dziugaite an Roy [4] an Neyshabur et al [1], use PAC-Bayes bouns to analyze generalization behavior in neural networks, evaluating the KL-ivergence, perturbation error L[f w+u ] L[f w ], or the entire boun numerically Here, we use the PAC-Bayes framework as a tool to analytically erive a margin-base boun in terms of norms of the weights As we saw in Lemma 1, the key to oing so is bouning the change in the output of the network when the weights are perturbe In the following lemma, we boun this change in terms of the spectral norm of the layers: 1
3 Lemma Perturbation Boun For any B, > 0, let f w : X B,n R k be a -layer network Then for any w, an x X B,n, an any perturbation u = vec {U i } such that Ui 1 W i, the change in the output of the network can be boune as follows: f w+ux f wx eb W i U i W i Next we use the above perturbation boun an the PAC-Bayes result Lemma 1 to erive the following generalization guarantee Theorem 1 Generalization Boun For any B,, h > 0, let f w : X B,n R k be a -layer feeforwar network with ReLU activations Then, for any δ, γ > 0, with probability 1 δ over a training set of size m, for any w, we have: L 0f w L γf w + O B h lnhπ Wi γ m W i F W i + ln m δ Comparing the above result to Bartlett et al s [1] boils own to comparing h W i F with W i 1 Recalling that W i is an h h matrix, we have that W i F W i 1 h W i F When the weights are fairly ense an are of uniform magnitue, the secon inequality will be tight, an we will have h Wi F W i 1, an Theorem 1 will ominate When the weights are sparse with roughly a constant number of significant weights per unit ie weight matrix with sparsity Θh, the bouns will be similar Bartlett et al s [1] boun will ominate when the weights are extremely sparse, with much fewer significant weights than units, ie when most units o not have any incoming or outgoing weights of significant magnitue Proof of Theorem 1 The proof involves mainly two steps In the first step we calculate what is the maximum allowe perturbation of parameters to satisfy a given margin conition γ, using Lemma In the secon step we calculate the KL term in the PAC-Bayes boun in Lemma 1, for this value of the perturbation 1/ Let β = W i an consier a network with the normalize weights Wi = β W i W i Due to the homogeneity of the ReLU, we have that for feeforwar networks with ReLU activations f w = f w, an so the empirical an expecte loss incluing margin loss is the same for w an w We can also verify that W i = W i an Wi F W i = W i F W, an so the excess i error in the Theorem statement is also invariant to this transformation It is therefore sufficient to prove the Theorem only for the normalize weights w, an hence we assume wlog that the spectral norm is equal across layers, ie for any layer i, W i = β Choose the istribution of the prior P to be N 0, σ I, an consier the ranom perturbation u N 0, σ I, with the same σ, which we will set later accoring to β More precisely, since the prior cannot epen on the learne preictor w or its norm, we will set σ base on an approximation β For each value of β on a pre-etermine gri, we will compute the PAC-Bayes boun, establishing the generalization guarantee for all w for which β β 1 β, an ensuring that each relevant value of β is covere by some β on the gri We will then take a union boun over all β on the gri For now, we will consier a fixe β an the w for which β β 1 β, an hence 1 e β 1 β 1 eβ 1 Since u N 0, σ I, we get the following boun for the spectral norm of U i [? ]: P Ui N0,σ I [ U i > t] he t /hσ Taking a union bon over the layers, we get that, with probability 1, the spectral norm of the perturbation U i in each layer is boune by σ h ln4h Plugging this spectral norm boun into Lemma we have that with probability at least 1, max x X B,n f w+ux f wx ebβ i U i β = ebβ 1 i U i e B β 1 σ h ln4h γ 4, 3
4 γ where we choose σ = 4B β 1 to get the last inequality Hence, the perturbation u with h ln4h the above value of σ satisfies the assumptions of the Lemma 1 We now calculate the KL-term in Lemma 1 with the chosen istributions for P an u, for the above value of σ KLw + u P w σ O B h lnh Π W i W i F γ W i Hence, for any β, with probability 1 δ an for all w such that, β β 1 β, we have: L 0f w L γf w + O B h lnhπ Wi W i F + ln m W i δ γ m 3 Finally we nee to take a union boun over ifferent choices of β Let us see how many choices of β we nee to ensure we always have β in the gri st β β 1 β We only nee to consier values of β in the range γ 1/ B β γ m 1/ B For β outsie this range the theorem statement hols trivially: Recall that the LHS of the theorem statement, L 0 f w is always boune by 1 If β < γ B, then for any x, f w x β B γ/ an therefore L γ = 1 Alternately, if β > γ m B, then the secon term in equation is greater than one Hence, we only nee to consier values of β in the range iscusse above Since we nee β to satisfy β β 1 β 1 γ 1/, B the size of the cover we nee to consier is boune by m 1 Taking a union boun over the choices of β in this cover an using the boun in equation 3 gives us the theorem statement Proof of Lemma Let i = f w+ux i fwx i We will prove using inuction that for any i 0: i i i i U j x W j The above inequality together with e proves the lemma statement The inuction base clearly hols since 0 = x x = 0 For any i 1, we have the following: i+1 = Wi+1 + U i+1 φ i f i w+ux W i+1 φ i f i wx = Wi+1 + U i+1 φ i f i w+ux φ i f i wx + U i+1 φ i f i wx W i+1 + U i+1 φ i f i w+ux φ i f i wx + U i+1 φ i f i wx W i+1 + U i+1 f i w+u x f i wx + U i+1 f i w x = i W i+1 + U i+1 + U i+1 f i w x, where the last inequality is by the Lipschitz property of the activation function an using φ0 = 0 The l norm of outputs of layer i is boune by x Π i an by the lemma assumption we have U i+1 1 W i+1 Therefore, using the inuction step, we get the following boun: i+1 i W i+1 + U i+1 x i+1 i+1 x i+1 i+1 i i+1 x i U j + Ui+1 x W i+1 U j i+1 W i 4
5 References [1] P Bartlett, D J Foster, an M Telgarsky Spectrally-normalize margin bouns for neural networks arxiv preprint arxiv: , 017 [] P L Bartlett The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network IEEE transactions on Information Theory, 44:55 536, 1998 [3] P L Bartlett an S Menelson Raemacher an gaussian complexities: Risk bouns an structural results Journal of Machine Learning Research, 3Nov:463 48, 00 [4] G K Dziugaite an D M Roy Computing nonvacuous generalization bouns for eep stochastic neural networks with many more parameters than training ata arxiv preprint arxiv: , 017 [5] N Harvey, C Liaw, an A Mehrabian Nearly-tight vc-imension bouns for piecewise linear neural networks arxiv preprint arxiv: , 017 [6] J Langfor an R Caruana not bouning the true error In Proceeings of the 14th International Conference on Neural Information Processing Systems: Natural an Synthetic, pages MIT Press, 001 [7] J Langfor an J Shawe-Taylor Pac-bayes & margins In Avances in neural information processing systems, pages , 003 [8] D McAllester Simplifie pac-bayesian margin bouns Lecture notes in computer science, pages 03 15, 003 [9] D A McAllester Some PAC-Bayesian theorems In Proceeings of the eleventh annual conference on Computational learning theory, pages ACM, 1998 [10] D A McAllester PAC-Bayesian moel averaging In Proceeings of the twelfth annual conference on Computational learning theory, pages ACM, 1999 [11] B Neyshabur, R Tomioka, an N Srebro Norm-base capacity control in neural networks In Proceeing of the 8th Conference on Learning Theory COLT, 015 [1] B Neyshabur, S Bhojanapalli, D McAllester, an N Srebro Exploring generalization in eep learning arxiv preprint arxiv: , 017 5
A PAC-BAYESIAN APPROACH TO SPECTRALLY-NORMALIZED MARGIN BOUNDS
Published as a conference paper at ICLR 08 A PAC-BAYESIAN APPROACH TO SPECTRALLY-NORMALIZED MARGIN BOUNDS FOR NEURAL NETWORKS Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro Toyota Technological
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationDeep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes
Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Daniel M. Roy University of Toronto; Vector Institute Joint work with Gintarė K. Džiugaitė University
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationLower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners
Lower Bouns for Local Monotonicity Reconstruction from Transitive-Closure Spanners Arnab Bhattacharyya Elena Grigorescu Mahav Jha Kyomin Jung Sofya Raskhonikova Davi P. Wooruff Abstract Given a irecte
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationSturm-Liouville Theory
LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationLecture 5. Symmetric Shearer s Lemma
Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here
More informationPDE Notes, Lecture #11
PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationEstimation of the Maximum Domination Value in Multi-Dimensional Data Sets
Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationSome Statistical Properties of Deep Networks
Some Statistical Properties of Deep Networks Peter Bartlett UC Berkeley August 2, 2018 1 / 22 Deep Networks Deep compositions of nonlinear functions h = h m h m 1 h 1 2 / 22 Deep Networks Deep compositions
More information1. Aufgabenblatt zur Vorlesung Probability Theory
24.10.17 1. Aufgabenblatt zur Vorlesung By (Ω, A, P ) we always enote the unerlying probability space, unless state otherwise. 1. Let r > 0, an efine f(x) = 1 [0, [ (x) exp( r x), x R. a) Show that p f
More informationGaussian processes with monotonicity information
Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process
More informationMath 342 Partial Differential Equations «Viktor Grigoryan
Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite
More informationA new proof of the sharpness of the phase transition for Bernoulli percolation on Z d
A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationBayesian Estimation of the Entropy of the Multivariate Gaussian
Bayesian Estimation of the Entropy of the Multivariate Gaussian Santosh Srivastava Fre Hutchinson Cancer Research Center Seattle, WA 989, USA Email: ssrivast@fhcrc.org Maya R. Gupta Department of Electrical
More informationAlgorithms and matching lower bounds for approximately-convex optimization
Algorithms an matching lower bouns for approximately-convex optimization Yuanzhi Li Department of Computer Science Princeton University Princeton, NJ, 08450 yuanzhil@cs.princeton.eu Anrej Risteski Department
More informationCapacity Analysis of MIMO Systems with Unknown Channel State Information
Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,
More informationAnalyzing Tensor Power Method Dynamics in Overcomplete Regime
Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical
More informationLecture 10: October 30, 2017
Information an Coing Theory Autumn 2017 Lecturer: Mahur Tulsiani Lecture 10: October 30, 2017 1 I-Projections an applications In this lecture, we will talk more about fining the istribution in a set Π
More informationLecture 6: Calculus. In Song Kim. September 7, 2011
Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear
More informationAdmin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016
Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network
More information19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control
19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior
More informationLATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION
The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische
More informationSTATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING
STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski
More informationQubit channels that achieve capacity with two states
Qubit channels that achieve capacity with two states Dominic W. Berry Department of Physics, The University of Queenslan, Brisbane, Queenslan 4072, Australia Receive 22 December 2004; publishe 22 March
More informationOn the number of isolated eigenvalues of a pair of particles in a quantum wire
On the number of isolate eigenvalues of a pair of particles in a quantum wire arxiv:1812.11804v1 [math-ph] 31 Dec 2018 Joachim Kerner 1 Department of Mathematics an Computer Science FernUniversität in
More informationSYNCHRONOUS SEQUENTIAL CIRCUITS
CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationCollaborative Ranking for Local Preferences Supplement
Collaborative Raning for Local Preferences Supplement Ber apicioglu Davi S Rosenberg Robert E Schapire ony Jebara YP YP Princeton University Columbia University Problem Formulation Let U {,,m} be the set
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationIPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy
IPA Derivatives for Make-to-Stock Prouction-Inventory Systems With Backorers Uner the (Rr) Policy Yihong Fan a Benamin Melame b Yao Zhao c Yorai Wari Abstract This paper aresses Infinitesimal Perturbation
More informationarxiv: v1 [cs.lg] 7 Jan 2019
Generalization in Deep Networks: The Role of Distance from Initialization arxiv:1901672v1 [cs.lg] 7 Jan 2019 Vaishnavh Nagarajan Computer Science Department Carnegie-Mellon University Pittsburgh, PA 15213
More informationarxiv: v1 [cs.lg] 22 Mar 2014
CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com
More informationA. Exclusive KL View of the MLE
A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More information. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.
S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial
More informationWUCHEN LI AND STANLEY OSHER
CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationInfluence of weight initialization on multilayer perceptron performance
Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -
More informationThe total derivative. Chapter Lagrangian and Eulerian approaches
Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function
More informationPARALLEL-PLATE CAPACITATOR
Physics Department Electric an Magnetism Laboratory PARALLEL-PLATE CAPACITATOR 1. Goal. The goal of this practice is the stuy of the electric fiel an electric potential insie a parallelplate capacitor.
More informationFLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction
FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number
More informationMark J. Machina CARDINAL PROPERTIES OF "LOCAL UTILITY FUNCTIONS"
Mark J. Machina CARDINAL PROPERTIES OF "LOCAL UTILITY FUNCTIONS" This paper outlines the carinal properties of "local utility functions" of the type use by Allen [1985], Chew [1983], Chew an MacCrimmon
More informationKNN Particle Filters for Dynamic Hybrid Bayesian Networks
KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030
More informationRobust Bounds for Classification via Selective Sampling
Nicolò Cesa-Bianchi DSI, Università egli Stui i Milano, Italy Clauio Gentile DICOM, Università ell Insubria, Varese, Italy Francesco Orabona Iiap, Martigny, Switzerlan cesa-bianchi@siunimiit clauiogentile@uninsubriait
More informationNested Saturation with Guaranteed Real Poles 1
Neste Saturation with Guarantee Real Poles Eric N Johnson 2 an Suresh K Kannan 3 School of Aerospace Engineering Georgia Institute of Technology, Atlanta, GA 3332 Abstract The global stabilization of asymptotically
More information1 Lecture 20: Implicit differentiation
Lecture 20: Implicit ifferentiation. Outline The technique of implicit ifferentiation Tangent lines to a circle Derivatives of inverse functions by implicit ifferentiation Examples.2 Implicit ifferentiation
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More informationProblem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs
Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More informationSome Examples. Uniform motion. Poisson processes on the real line
Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions
More informationTwo formulas for the Euler ϕ-function
Two formulas for the Euler ϕ-function Robert Frieman A multiplication formula for ϕ(n) The first formula we want to prove is the following: Theorem 1. If n 1 an n 2 are relatively prime positive integers,
More informationALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS
ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS MARK SCHACHNER Abstract. When consiere as an algebraic space, the set of arithmetic functions equippe with the operations of pointwise aition an
More informationStructural Risk Minimization over Data-Dependent Hierarchies
Structural Risk Minimization over Data-Depenent Hierarchies John Shawe-Taylor Department of Computer Science Royal Holloway an Befor New College University of Lonon Egham, TW20 0EX, UK jst@cs.rhbnc.ac.uk
More informationRelative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation
Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University
More informationELEC3114 Control Systems 1
ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.
More informationSituation awareness of power system based on static voltage security region
The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran
More informationModeling time-varying storage components in PSpice
Moeling time-varying storage components in PSpice Dalibor Biolek, Zenek Kolka, Viera Biolkova Dept. of EE, FMT, University of Defence Brno, Czech Republic Dept. of Microelectronics/Raioelectronics, FEEC,
More informationConservation Laws. Chapter Conservation of Energy
20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action
More informationMonotonicity for excited random walk in high dimensions
Monotonicity for excite ranom walk in high imensions Remco van er Hofsta Mark Holmes March, 2009 Abstract We prove that the rift θ, β) for excite ranom walk in imension is monotone in the excitement parameter
More informationModelling and simulation of dependence structures in nonlife insurance with Bernstein copulas
Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More informationImage Denoising Using Spatial Adaptive Thresholding
International Journal of Engineering Technology, Management an Applie Sciences Image Denoising Using Spatial Aaptive Thresholing Raneesh Mishra M. Tech Stuent, Department of Electronics & Communication,
More informationDatabase-friendly Random Projections
Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationFunction Spaces. 1 Hilbert Spaces
Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure
More informationA New Minimum Description Length
A New Minimum Description Length Soosan Beheshti, Munther A. Dahleh Laboratory for Information an Decision Systems Massachusetts Institute of Technology soosan@mit.eu,ahleh@lis.mit.eu Abstract The minimum
More informationPETER L. BARTLETT AND MARTEN H. WEGKAMP
CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS PETER L. BARTLETT AND MARTEN H. WEGKAMP Abstract. We consier the problem of binary classification where the classifier can, for a particular cost,
More informationLinear and quadratic approximation
Linear an quaratic approximation November 11, 2013 Definition: Suppose f is a function that is ifferentiable on an interval I containing the point a. The linear approximation to f at a is the linear function
More informationThe Press-Schechter mass function
The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for
More informationAgmon Kolmogorov Inequalities on l 2 (Z d )
Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,
More informationRamsey numbers of some bipartite graphs versus complete graphs
Ramsey numbers of some bipartite graphs versus complete graphs Tao Jiang, Michael Salerno Miami University, Oxfor, OH 45056, USA Abstract. The Ramsey number r(h, K n ) is the smallest positive integer
More informationAll s Well That Ends Well: Supplementary Proofs
All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee
More informationTractability results for weighted Banach spaces of smooth functions
Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March
More informationSelf-normalized Martingale Tail Inequality
Online-to-Confience-Set Conversions an Application to Sparse Stochastic Banits A Self-normalize Martingale Tail Inequality The self-normalize martingale tail inequality that we present here is the scalar-value
More informationSOME RESULTS ON THE GEOMETRY OF MINKOWSKI PLANE. Bing Ye Wu
ARCHIVUM MATHEMATICUM (BRNO Tomus 46 (21, 177 184 SOME RESULTS ON THE GEOMETRY OF MINKOWSKI PLANE Bing Ye Wu Abstract. In this paper we stuy the geometry of Minkowski plane an obtain some results. We focus
More informationRobustness and Perturbations of Minimal Bases
Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important
More informationIntroduction to Machine Learning
How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression
More information12.11 Laplace s Equation in Cylindrical and
SEC. 2. Laplace s Equation in Cylinrical an Spherical Coorinates. Potential 593 2. Laplace s Equation in Cylinrical an Spherical Coorinates. Potential One of the most important PDEs in physics an engineering
More informationGeneralization in Deep Networks
Generalization in Deep Networks Peter Bartlett BAIR UC Berkeley November 28, 2017 1 / 29 Deep neural networks Game playing (Jung Yeon-Je/AFP/Getty Images) 2 / 29 Deep neural networks Image recognition
More informationMulti-View Clustering via Canonical Correlation Analysis
Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms
More informationLevel Construction of Decision Trees in a Partition-based Framework for Classification
Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S
More informationDEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS
DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS SHANKAR BHAMIDI 1, JESSE GOODMAN 2, REMCO VAN DER HOFSTAD 3, AND JÚLIA KOMJÁTHY3 Abstract. In this article, we explicitly
More informationTMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments
Problem F U L W D g m 3 2 s 2 0 0 0 0 2 kg 0 0 0 0 0 0 Table : Dimension matrix TMA 495 Matematisk moellering Exam Tuesay December 6, 2008 09:00 3:00 Problems an solution with aitional comments The necessary
More informationA Weak First Digit Law for a Class of Sequences
International Mathematical Forum, Vol. 11, 2016, no. 15, 67-702 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1288/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of
More informationWESD - Weighted Spectral Distance for Measuring Shape Dissimilarity
1 WESD - Weighte Spectral Distance for Measuring Shape Dissimilarity Ener Konukoglu, Ben Glocker, Antonio Criminisi an Kilian M. Pohl Abstract This article presents a new istance for measuring shape issimilarity
More informationHyperbolic Systems of Equations Posed on Erroneous Curved Domains
Hyperbolic Systems of Equations Pose on Erroneous Curve Domains Jan Norström a, Samira Nikkar b a Department of Mathematics, Computational Mathematics, Linköping University, SE-58 83 Linköping, Sween (
More informationDesigning of Acceptance Double Sampling Plan for Life Test Based on Percentiles of Exponentiated Rayleigh Distribution
International Journal of Statistics an Systems ISSN 973-675 Volume, Number 3 (7), pp. 475-484 Research Inia Publications http://www.ripublication.com Designing of Acceptance Double Sampling Plan for Life
More informationarxiv: v4 [cs.ds] 7 Mar 2014
Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning
More information