A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

Size: px
Start display at page:

Download "A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks"

Transcription

1 A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur, srinah, mcallester, Abstract We present a generalization boun for feeforwar neural networks in terms of the prouct of the spectral norm of the layers an the Frobenius norm of the weights The generalization boun is erive using a PAC-Bayes analysis 1 Introuction In this note we present an prove a margin base generalization boun for feeforwar neural networks, that epens on the prouct of the spectral norm of the weights in each layer, as well as the Frobenius norm of the weights Our generalization boun shares much similarity with a margin base generalization boun recently presente by Bartlett et al [1] Both bouns epen similarly on the prouct of the spectral norms of each layer, multiplie by a factor that is aitive across layers In aition, Bartlett et al s [1] boun epens on the elementwise l 1 -norm of the weights in each layer, while our boun epens on the Frobenius elementwise l norm of the weights in each layer, with an aitional multiplicative epenence on the with The two bouns are thus not irectly comparable, an each one ominates in a ifferent regime, roughly epening on the sparsity of the weights More importantly, our proof technique is entirely ifferent, an arguably simpler, than that of Bartlett et al [1] We erive our boun using PAC-Bayes analysis, an more specifically a generic PAC-Bayes margin boun Lemma 1 The main ingreient is a perturbation boun Lemma, bouning the changes in the output of a network when the weights are perturbe, in terms of the prouct of the spectral norm of the layers This is an entirely ifferent analysis approach from Bartlett et al s [1] covering number analysis We hope our analysis can give more irect intuition into the ifferent ingreients in the boun an will allow moifying the analysis, eg by using ifferent prior an perturbation istributions in the PAC-Bayes boun, to obtain tighter bouns, perhaps with epenence on ifferent layer-wise norms We note that prior bouns in terms of elementwise or unit-wise norms such as the Frobenius norm an elementwise l 1 norms of layers, without a spectral norm epenence, all have a multiplicative epenence across layers or exponential epenence on epth Bartlett an Menelson [3], Neyshabur et al [11], or are for constant epth networks Bartlett [] Here only the spectral norm is multiplie across layers, an thus if the spectral norms are close to one, the exponential epenence on epth can be avoie 11 Preliminaries Consier the classification task with input omain X B,n = { x R n n x i B} an output omain R k where the output of the moel is a score for each class an the class with the maximum score will be selecte as the preicte label Let f w x : X B,n R k be the function compute

2 by a layer fee-forwar network for the classification task with parameters w = vec {W i }, f w x = W φw 1 φφw 1 x, here φ is the ReLU activation function Let fwx i enote the output of layer i before activation an h be an upper boun on the number of output units in each layer We can then efine fully connecte feeforwar networks recursively: fwx 1 = W 1 x an fwx i = W i φfw i 1 x Let F, 1 an enote the Frobenius norm, the element-wise l 1 norm an the spectral norm respectively We further enote the l p norm of a vector by p For any istribution D an margin γ > 0, we efine the expecte margin loss as follows: ] L γ f w = P x,y D [f w x[y] γ + max f wx[j] j y Let L γ f w be the empirical estimate of the above expecte margin loss Since setting γ = 0 correspons to the classification loss, we will use L 0 f w an L 0 f w to refer to the expecte risk an the training error The loss L γ efine this way is boune between 0 an 1 1 PAC-Bayesian framework The PAC-Bayesian framework [9, 10] provies generalization guarantees for ranomize preictors, rawn form a learne istribution Q as oppose to a learne single preictor that epens on the training ata In particular, let f w be any preictor not necessarily a neural network learne from the training ata an parametrize by w We consier the istribution Q over preictors of the form f w+u, where u is a ranom variable whose istribution may also epen on the training ata Given a prior istribution P over the set of preictors that is inepenent of the training ata, the PAC-Bayes theorem states that with probability at least 1 δ over the raw of the training ata, the expecte error of f w+u can be boune as follows [8]: KL w + u P + ln m δ E u [L 0 f w+u ] E u [ L 0 f w+u ] + m 1 To get a boun on the expecte risk L 0 f w for a single preictor f w, we nee to relate the expecte perturbe loss, E u [L 0 f w+u ] in the above equation with L 0 f w Towar this we use the following lemma that gives a margin-base generalization boun erive from the PAC-Bayesian boun : Lemma 1 Let f w x : X R k be any preictor not necessarily a neural network with parameters w, an P be any istribution on the parameters that is inepenent of the training ata Then, for any γ, δ > 0, with probability [ 1 δ over the training set of size m, for any w, an any ranom perturbation u st P u maxx X f w+u x f w x < γ ] 4 1, we have: L 0 f w L γ f w + 4 KL w + u P + ln 6m δ m 1 In the above expression the KL is evaluate for a fixe w an only u is ranom, ie the istribution of w + u is the istribution of u shifte by w The lemma is analogous to similar analysis of Langfor an Shawe-Taylor [7] an McAllester [8] obtaining PAC-Bayes margin bouns for linear preictors As we state the lemma, it is not specific to linear separators, nor neural networks, an hols generally for any real-value preictor We next show how to utilize the above general PAC-Bayes boun to prove generalization guarantees for feeforwar networks base on the spectral norm of its layers Generalization Boun In this section we present our generalization boun for feefowar networks with ReLU activations, erive using the PAC-Bayesian framework Langfor an Caruana [6], an more recently Dziugaite an Roy [4] an Neyshabur et al [1], use PAC-Bayes bouns to analyze generalization behavior in neural networks, evaluating the KL-ivergence, perturbation error L[f w+u ] L[f w ], or the entire boun numerically Here, we use the PAC-Bayes framework as a tool to analytically erive a margin-base boun in terms of norms of the weights As we saw in Lemma 1, the key to oing so is bouning the change in the output of the network when the weights are perturbe In the following lemma, we boun this change in terms of the spectral norm of the layers: 1

3 Lemma Perturbation Boun For any B, > 0, let f w : X B,n R k be a -layer network Then for any w, an x X B,n, an any perturbation u = vec {U i } such that Ui 1 W i, the change in the output of the network can be boune as follows: f w+ux f wx eb W i U i W i Next we use the above perturbation boun an the PAC-Bayes result Lemma 1 to erive the following generalization guarantee Theorem 1 Generalization Boun For any B,, h > 0, let f w : X B,n R k be a -layer feeforwar network with ReLU activations Then, for any δ, γ > 0, with probability 1 δ over a training set of size m, for any w, we have: L 0f w L γf w + O B h lnhπ Wi γ m W i F W i + ln m δ Comparing the above result to Bartlett et al s [1] boils own to comparing h W i F with W i 1 Recalling that W i is an h h matrix, we have that W i F W i 1 h W i F When the weights are fairly ense an are of uniform magnitue, the secon inequality will be tight, an we will have h Wi F W i 1, an Theorem 1 will ominate When the weights are sparse with roughly a constant number of significant weights per unit ie weight matrix with sparsity Θh, the bouns will be similar Bartlett et al s [1] boun will ominate when the weights are extremely sparse, with much fewer significant weights than units, ie when most units o not have any incoming or outgoing weights of significant magnitue Proof of Theorem 1 The proof involves mainly two steps In the first step we calculate what is the maximum allowe perturbation of parameters to satisfy a given margin conition γ, using Lemma In the secon step we calculate the KL term in the PAC-Bayes boun in Lemma 1, for this value of the perturbation 1/ Let β = W i an consier a network with the normalize weights Wi = β W i W i Due to the homogeneity of the ReLU, we have that for feeforwar networks with ReLU activations f w = f w, an so the empirical an expecte loss incluing margin loss is the same for w an w We can also verify that W i = W i an Wi F W i = W i F W, an so the excess i error in the Theorem statement is also invariant to this transformation It is therefore sufficient to prove the Theorem only for the normalize weights w, an hence we assume wlog that the spectral norm is equal across layers, ie for any layer i, W i = β Choose the istribution of the prior P to be N 0, σ I, an consier the ranom perturbation u N 0, σ I, with the same σ, which we will set later accoring to β More precisely, since the prior cannot epen on the learne preictor w or its norm, we will set σ base on an approximation β For each value of β on a pre-etermine gri, we will compute the PAC-Bayes boun, establishing the generalization guarantee for all w for which β β 1 β, an ensuring that each relevant value of β is covere by some β on the gri We will then take a union boun over all β on the gri For now, we will consier a fixe β an the w for which β β 1 β, an hence 1 e β 1 β 1 eβ 1 Since u N 0, σ I, we get the following boun for the spectral norm of U i [? ]: P Ui N0,σ I [ U i > t] he t /hσ Taking a union bon over the layers, we get that, with probability 1, the spectral norm of the perturbation U i in each layer is boune by σ h ln4h Plugging this spectral norm boun into Lemma we have that with probability at least 1, max x X B,n f w+ux f wx ebβ i U i β = ebβ 1 i U i e B β 1 σ h ln4h γ 4, 3

4 γ where we choose σ = 4B β 1 to get the last inequality Hence, the perturbation u with h ln4h the above value of σ satisfies the assumptions of the Lemma 1 We now calculate the KL-term in Lemma 1 with the chosen istributions for P an u, for the above value of σ KLw + u P w σ O B h lnh Π W i W i F γ W i Hence, for any β, with probability 1 δ an for all w such that, β β 1 β, we have: L 0f w L γf w + O B h lnhπ Wi W i F + ln m W i δ γ m 3 Finally we nee to take a union boun over ifferent choices of β Let us see how many choices of β we nee to ensure we always have β in the gri st β β 1 β We only nee to consier values of β in the range γ 1/ B β γ m 1/ B For β outsie this range the theorem statement hols trivially: Recall that the LHS of the theorem statement, L 0 f w is always boune by 1 If β < γ B, then for any x, f w x β B γ/ an therefore L γ = 1 Alternately, if β > γ m B, then the secon term in equation is greater than one Hence, we only nee to consier values of β in the range iscusse above Since we nee β to satisfy β β 1 β 1 γ 1/, B the size of the cover we nee to consier is boune by m 1 Taking a union boun over the choices of β in this cover an using the boun in equation 3 gives us the theorem statement Proof of Lemma Let i = f w+ux i fwx i We will prove using inuction that for any i 0: i i i i U j x W j The above inequality together with e proves the lemma statement The inuction base clearly hols since 0 = x x = 0 For any i 1, we have the following: i+1 = Wi+1 + U i+1 φ i f i w+ux W i+1 φ i f i wx = Wi+1 + U i+1 φ i f i w+ux φ i f i wx + U i+1 φ i f i wx W i+1 + U i+1 φ i f i w+ux φ i f i wx + U i+1 φ i f i wx W i+1 + U i+1 f i w+u x f i wx + U i+1 f i w x = i W i+1 + U i+1 + U i+1 f i w x, where the last inequality is by the Lipschitz property of the activation function an using φ0 = 0 The l norm of outputs of layer i is boune by x Π i an by the lemma assumption we have U i+1 1 W i+1 Therefore, using the inuction step, we get the following boun: i+1 i W i+1 + U i+1 x i+1 i+1 x i+1 i+1 i i+1 x i U j + Ui+1 x W i+1 U j i+1 W i 4

5 References [1] P Bartlett, D J Foster, an M Telgarsky Spectrally-normalize margin bouns for neural networks arxiv preprint arxiv: , 017 [] P L Bartlett The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network IEEE transactions on Information Theory, 44:55 536, 1998 [3] P L Bartlett an S Menelson Raemacher an gaussian complexities: Risk bouns an structural results Journal of Machine Learning Research, 3Nov:463 48, 00 [4] G K Dziugaite an D M Roy Computing nonvacuous generalization bouns for eep stochastic neural networks with many more parameters than training ata arxiv preprint arxiv: , 017 [5] N Harvey, C Liaw, an A Mehrabian Nearly-tight vc-imension bouns for piecewise linear neural networks arxiv preprint arxiv: , 017 [6] J Langfor an R Caruana not bouning the true error In Proceeings of the 14th International Conference on Neural Information Processing Systems: Natural an Synthetic, pages MIT Press, 001 [7] J Langfor an J Shawe-Taylor Pac-bayes & margins In Avances in neural information processing systems, pages , 003 [8] D McAllester Simplifie pac-bayesian margin bouns Lecture notes in computer science, pages 03 15, 003 [9] D A McAllester Some PAC-Bayesian theorems In Proceeings of the eleventh annual conference on Computational learning theory, pages ACM, 1998 [10] D A McAllester PAC-Bayesian moel averaging In Proceeings of the twelfth annual conference on Computational learning theory, pages ACM, 1999 [11] B Neyshabur, R Tomioka, an N Srebro Norm-base capacity control in neural networks In Proceeing of the 8th Conference on Learning Theory COLT, 015 [1] B Neyshabur, S Bhojanapalli, D McAllester, an N Srebro Exploring generalization in eep learning arxiv preprint arxiv: , 017 5

A PAC-BAYESIAN APPROACH TO SPECTRALLY-NORMALIZED MARGIN BOUNDS

A PAC-BAYESIAN APPROACH TO SPECTRALLY-NORMALIZED MARGIN BOUNDS Published as a conference paper at ICLR 08 A PAC-BAYESIAN APPROACH TO SPECTRALLY-NORMALIZED MARGIN BOUNDS FOR NEURAL NETWORKS Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro Toyota Technological

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes

Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Deep Neural Networks: From Flat Minima to Numerically Nonvacuous Generalization Bounds via PAC-Bayes Daniel M. Roy University of Toronto; Vector Institute Joint work with Gintarė K. Džiugaitė University

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners

Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners Lower Bouns for Local Monotonicity Reconstruction from Transitive-Closure Spanners Arnab Bhattacharyya Elena Grigorescu Mahav Jha Kyomin Jung Sofya Raskhonikova Davi P. Wooruff Abstract Given a irecte

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Lecture 5. Symmetric Shearer s Lemma

Lecture 5. Symmetric Shearer s Lemma Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets

Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Proceeings of the 4th East-European Conference on Avances in Databases an Information Systems ADBIS) 200 Estimation of the Maximum Domination Value in Multi-Dimensional Data Sets Eleftherios Tiakas, Apostolos.

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Some Statistical Properties of Deep Networks

Some Statistical Properties of Deep Networks Some Statistical Properties of Deep Networks Peter Bartlett UC Berkeley August 2, 2018 1 / 22 Deep Networks Deep compositions of nonlinear functions h = h m h m 1 h 1 2 / 22 Deep Networks Deep compositions

More information

1. Aufgabenblatt zur Vorlesung Probability Theory

1. Aufgabenblatt zur Vorlesung Probability Theory 24.10.17 1. Aufgabenblatt zur Vorlesung By (Ω, A, P ) we always enote the unerlying probability space, unless state otherwise. 1. Let r > 0, an efine f(x) = 1 [0, [ (x) exp( r x), x R. a) Show that p f

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d

A new proof of the sharpness of the phase transition for Bernoulli percolation on Z d A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Bayesian Estimation of the Entropy of the Multivariate Gaussian

Bayesian Estimation of the Entropy of the Multivariate Gaussian Bayesian Estimation of the Entropy of the Multivariate Gaussian Santosh Srivastava Fre Hutchinson Cancer Research Center Seattle, WA 989, USA Email: ssrivast@fhcrc.org Maya R. Gupta Department of Electrical

More information

Algorithms and matching lower bounds for approximately-convex optimization

Algorithms and matching lower bounds for approximately-convex optimization Algorithms an matching lower bouns for approximately-convex optimization Yuanzhi Li Department of Computer Science Princeton University Princeton, NJ, 08450 yuanzhil@cs.princeton.eu Anrej Risteski Department

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Lecture 10: October 30, 2017

Lecture 10: October 30, 2017 Information an Coing Theory Autumn 2017 Lecturer: Mahur Tulsiani Lecture 10: October 30, 2017 1 I-Projections an applications In this lecture, we will talk more about fining the istribution in a set Π

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski

More information

Qubit channels that achieve capacity with two states

Qubit channels that achieve capacity with two states Qubit channels that achieve capacity with two states Dominic W. Berry Department of Physics, The University of Queenslan, Brisbane, Queenslan 4072, Australia Receive 22 December 2004; publishe 22 March

More information

On the number of isolated eigenvalues of a pair of particles in a quantum wire

On the number of isolated eigenvalues of a pair of particles in a quantum wire On the number of isolate eigenvalues of a pair of particles in a quantum wire arxiv:1812.11804v1 [math-ph] 31 Dec 2018 Joachim Kerner 1 Department of Mathematics an Computer Science FernUniversität in

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Collaborative Ranking for Local Preferences Supplement

Collaborative Ranking for Local Preferences Supplement Collaborative Raning for Local Preferences Supplement Ber apicioglu Davi S Rosenberg Robert E Schapire ony Jebara YP YP Princeton University Columbia University Problem Formulation Let U {,,m} be the set

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

IPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy

IPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy IPA Derivatives for Make-to-Stock Prouction-Inventory Systems With Backorers Uner the (Rr) Policy Yihong Fan a Benamin Melame b Yao Zhao c Yorai Wari Abstract This paper aresses Infinitesimal Perturbation

More information

arxiv: v1 [cs.lg] 7 Jan 2019

arxiv: v1 [cs.lg] 7 Jan 2019 Generalization in Deep Networks: The Role of Distance from Initialization arxiv:1901672v1 [cs.lg] 7 Jan 2019 Vaishnavh Nagarajan Computer Science Department Carnegie-Mellon University Pittsburgh, PA 15213

More information

arxiv: v1 [cs.lg] 22 Mar 2014

arxiv: v1 [cs.lg] 22 Mar 2014 CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com

More information

A. Exclusive KL View of the MLE

A. Exclusive KL View of the MLE A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

WUCHEN LI AND STANLEY OSHER

WUCHEN LI AND STANLEY OSHER CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

The total derivative. Chapter Lagrangian and Eulerian approaches

The total derivative. Chapter Lagrangian and Eulerian approaches Chapter 5 The total erivative 51 Lagrangian an Eulerian approaches The representation of a flui through scalar or vector fiels means that each physical quantity uner consieration is escribe as a function

More information

PARALLEL-PLATE CAPACITATOR

PARALLEL-PLATE CAPACITATOR Physics Department Electric an Magnetism Laboratory PARALLEL-PLATE CAPACITATOR 1. Goal. The goal of this practice is the stuy of the electric fiel an electric potential insie a parallelplate capacitor.

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

Mark J. Machina CARDINAL PROPERTIES OF "LOCAL UTILITY FUNCTIONS"

Mark J. Machina CARDINAL PROPERTIES OF LOCAL UTILITY FUNCTIONS Mark J. Machina CARDINAL PROPERTIES OF "LOCAL UTILITY FUNCTIONS" This paper outlines the carinal properties of "local utility functions" of the type use by Allen [1985], Chew [1983], Chew an MacCrimmon

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

Robust Bounds for Classification via Selective Sampling

Robust Bounds for Classification via Selective Sampling Nicolò Cesa-Bianchi DSI, Università egli Stui i Milano, Italy Clauio Gentile DICOM, Università ell Insubria, Varese, Italy Francesco Orabona Iiap, Martigny, Switzerlan cesa-bianchi@siunimiit clauiogentile@uninsubriait

More information

Nested Saturation with Guaranteed Real Poles 1

Nested Saturation with Guaranteed Real Poles 1 Neste Saturation with Guarantee Real Poles Eric N Johnson 2 an Suresh K Kannan 3 School of Aerospace Engineering Georgia Institute of Technology, Atlanta, GA 3332 Abstract The global stabilization of asymptotically

More information

1 Lecture 20: Implicit differentiation

1 Lecture 20: Implicit differentiation Lecture 20: Implicit ifferentiation. Outline The technique of implicit ifferentiation Tangent lines to a circle Derivatives of inverse functions by implicit ifferentiation Examples.2 Implicit ifferentiation

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

arxiv: v4 [math.pr] 27 Jul 2016

arxiv: v4 [math.pr] 27 Jul 2016 The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,

More information

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Some Examples. Uniform motion. Poisson processes on the real line

Some Examples. Uniform motion. Poisson processes on the real line Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]

More information

Robust Low Rank Kernel Embeddings of Multivariate Distributions

Robust Low Rank Kernel Embeddings of Multivariate Distributions Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions

More information

Two formulas for the Euler ϕ-function

Two formulas for the Euler ϕ-function Two formulas for the Euler ϕ-function Robert Frieman A multiplication formula for ϕ(n) The first formula we want to prove is the following: Theorem 1. If n 1 an n 2 are relatively prime positive integers,

More information

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS MARK SCHACHNER Abstract. When consiere as an algebraic space, the set of arithmetic functions equippe with the operations of pointwise aition an

More information

Structural Risk Minimization over Data-Dependent Hierarchies

Structural Risk Minimization over Data-Dependent Hierarchies Structural Risk Minimization over Data-Depenent Hierarchies John Shawe-Taylor Department of Computer Science Royal Holloway an Befor New College University of Lonon Egham, TW20 0EX, UK jst@cs.rhbnc.ac.uk

More information

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University

More information

ELEC3114 Control Systems 1

ELEC3114 Control Systems 1 ELEC34 Control Systems Linear Systems - Moelling - Some Issues Session 2, 2007 Introuction Linear systems may be represente in a number of ifferent ways. Figure shows the relationship between various representations.

More information

Situation awareness of power system based on static voltage security region

Situation awareness of power system based on static voltage security region The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran

More information

Modeling time-varying storage components in PSpice

Modeling time-varying storage components in PSpice Moeling time-varying storage components in PSpice Dalibor Biolek, Zenek Kolka, Viera Biolkova Dept. of EE, FMT, University of Defence Brno, Czech Republic Dept. of Microelectronics/Raioelectronics, FEEC,

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

Monotonicity for excited random walk in high dimensions

Monotonicity for excited random walk in high dimensions Monotonicity for excite ranom walk in high imensions Remco van er Hofsta Mark Holmes March, 2009 Abstract We prove that the rift θ, β) for excite ranom walk in imension is monotone in the excitement parameter

More information

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas Moelling an simulation of epenence structures in nonlife insurance with Bernstein copulas Prof. Dr. Dietmar Pfeifer Dept. of Mathematics, University of Olenburg an AON Benfiel, Hamburg Dr. Doreen Straßburger

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Image Denoising Using Spatial Adaptive Thresholding

Image Denoising Using Spatial Adaptive Thresholding International Journal of Engineering Technology, Management an Applie Sciences Image Denoising Using Spatial Aaptive Thresholing Raneesh Mishra M. Tech Stuent, Department of Electronics & Communication,

More information

Database-friendly Random Projections

Database-friendly Random Projections Database-frienly Ranom Projections Dimitris Achlioptas Microsoft ABSTRACT A classic result of Johnson an Linenstrauss asserts that any set of n points in -imensional Eucliean space can be embee into k-imensional

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

A New Minimum Description Length

A New Minimum Description Length A New Minimum Description Length Soosan Beheshti, Munther A. Dahleh Laboratory for Information an Decision Systems Massachusetts Institute of Technology soosan@mit.eu,ahleh@lis.mit.eu Abstract The minimum

More information

PETER L. BARTLETT AND MARTEN H. WEGKAMP

PETER L. BARTLETT AND MARTEN H. WEGKAMP CLASSIFICATION WITH A REJECT OPTION USING A HINGE LOSS PETER L. BARTLETT AND MARTEN H. WEGKAMP Abstract. We consier the problem of binary classification where the classifier can, for a particular cost,

More information

Linear and quadratic approximation

Linear and quadratic approximation Linear an quaratic approximation November 11, 2013 Definition: Suppose f is a function that is ifferentiable on an interval I containing the point a. The linear approximation to f at a is the linear function

More information

The Press-Schechter mass function

The Press-Schechter mass function The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

Ramsey numbers of some bipartite graphs versus complete graphs

Ramsey numbers of some bipartite graphs versus complete graphs Ramsey numbers of some bipartite graphs versus complete graphs Tao Jiang, Michael Salerno Miami University, Oxfor, OH 45056, USA Abstract. The Ramsey number r(h, K n ) is the smallest positive integer

More information

All s Well That Ends Well: Supplementary Proofs

All s Well That Ends Well: Supplementary Proofs All s Well That Ens Well: Guarantee Resolution of Simultaneous Rigi Boy Impact 1:1 All s Well That Ens Well: Supplementary Proofs This ocument complements the paper All s Well That Ens Well: Guarantee

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

Self-normalized Martingale Tail Inequality

Self-normalized Martingale Tail Inequality Online-to-Confience-Set Conversions an Application to Sparse Stochastic Banits A Self-normalize Martingale Tail Inequality The self-normalize martingale tail inequality that we present here is the scalar-value

More information

SOME RESULTS ON THE GEOMETRY OF MINKOWSKI PLANE. Bing Ye Wu

SOME RESULTS ON THE GEOMETRY OF MINKOWSKI PLANE. Bing Ye Wu ARCHIVUM MATHEMATICUM (BRNO Tomus 46 (21, 177 184 SOME RESULTS ON THE GEOMETRY OF MINKOWSKI PLANE Bing Ye Wu Abstract. In this paper we stuy the geometry of Minkowski plane an obtain some results. We focus

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

12.11 Laplace s Equation in Cylindrical and

12.11 Laplace s Equation in Cylindrical and SEC. 2. Laplace s Equation in Cylinrical an Spherical Coorinates. Potential 593 2. Laplace s Equation in Cylinrical an Spherical Coorinates. Potential One of the most important PDEs in physics an engineering

More information

Generalization in Deep Networks

Generalization in Deep Networks Generalization in Deep Networks Peter Bartlett BAIR UC Berkeley November 28, 2017 1 / 29 Deep neural networks Game playing (Jung Yeon-Je/AFP/Getty Images) 2 / 29 Deep neural networks Image recognition

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

Level Construction of Decision Trees in a Partition-based Framework for Classification

Level Construction of Decision Trees in a Partition-based Framework for Classification Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S

More information

DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS

DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS DEGREE DISTRIBUTION OF SHORTEST PATH TREES AND BIAS OF NETWORK SAMPLING ALGORITHMS SHANKAR BHAMIDI 1, JESSE GOODMAN 2, REMCO VAN DER HOFSTAD 3, AND JÚLIA KOMJÁTHY3 Abstract. In this article, we explicitly

More information

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments Problem F U L W D g m 3 2 s 2 0 0 0 0 2 kg 0 0 0 0 0 0 Table : Dimension matrix TMA 495 Matematisk moellering Exam Tuesay December 6, 2008 09:00 3:00 Problems an solution with aitional comments The necessary

More information

A Weak First Digit Law for a Class of Sequences

A Weak First Digit Law for a Class of Sequences International Mathematical Forum, Vol. 11, 2016, no. 15, 67-702 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1288/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of

More information

WESD - Weighted Spectral Distance for Measuring Shape Dissimilarity

WESD - Weighted Spectral Distance for Measuring Shape Dissimilarity 1 WESD - Weighte Spectral Distance for Measuring Shape Dissimilarity Ener Konukoglu, Ben Glocker, Antonio Criminisi an Kilian M. Pohl Abstract This article presents a new istance for measuring shape issimilarity

More information

Hyperbolic Systems of Equations Posed on Erroneous Curved Domains

Hyperbolic Systems of Equations Posed on Erroneous Curved Domains Hyperbolic Systems of Equations Pose on Erroneous Curve Domains Jan Norström a, Samira Nikkar b a Department of Mathematics, Computational Mathematics, Linköping University, SE-58 83 Linköping, Sween (

More information

Designing of Acceptance Double Sampling Plan for Life Test Based on Percentiles of Exponentiated Rayleigh Distribution

Designing of Acceptance Double Sampling Plan for Life Test Based on Percentiles of Exponentiated Rayleigh Distribution International Journal of Statistics an Systems ISSN 973-675 Volume, Number 3 (7), pp. 475-484 Research Inia Publications http://www.ripublication.com Designing of Acceptance Double Sampling Plan for Life

More information

arxiv: v4 [cs.ds] 7 Mar 2014

arxiv: v4 [cs.ds] 7 Mar 2014 Analysis of Agglomerative Clustering Marcel R. Ackermann Johannes Blömer Daniel Kuntze Christian Sohler arxiv:101.697v [cs.ds] 7 Mar 01 Abstract The iameter k-clustering problem is the problem of partitioning

More information