Bayesian Estimation of the Entropy of the Multivariate Gaussian
|
|
- Millicent Carroll
- 5 years ago
- Views:
Transcription
1 Bayesian Estimation of the Entropy of the Multivariate Gaussian Santosh Srivastava Fre Hutchinson Cancer Research Center Seattle, WA 989, USA Maya R. Gupta Department of Electrical Engineering University of Washington Seattle, WA 989, USA Abstract Estimating the entropy of a Gaussian istribution from samples rawn from the istribution is a ifficult problem when the number of samples is smaller than the number of imensions. A new Bayesian entropy estimator is propose using an inverte Wishart istribution an a ata-epenent prior that hanles the small-sample case. Experiments for six ifferent cases show that the propose estimator provies goo performance for the small-sample case compare to the stanar nearest-neighbor entropy estimator. Aitionally, it is shown that the Bayesian estimate forme by taking the expecte entropy minimizes expecte Bregman ivergence. I. INTRODUCTION Entropy is a useful escription of the preictability of a istribution, an has use in many applications of coing, machine learning, signal processing, communications, an chemistry [] []. In practice, many continuous generating istributions are moele as Gaussian istributions. One reason for this is the central limit theorem, another reason is because the Gaussian is the maximum entropy istribution given an empirical mean an covariance, an as such is a least assumptive moel. For the multivariate Gaussian istribution, the entropy goes as the log eterminant of the covariance; specifically, the ifferential entropy of a -imensional ranom vector X rawn from the Gaussian N µ, Σ is hx = N x ln N xx = lnπ ln Σ. In this paper we consier the practical problem of estimating the entropy of a generating normal istribution N given samples x, x,..., x n that are assume to be realizations of ranom vectors X, X,..., X n rawn ii from N µ, Σ. In particular, we focus on the ifficult limite-ata case that n, which occurs often in practice with high-imensional ata. One approach to estimating hx is to first estimate the Gaussian istribution, for example with maximum likelihoo ML estimates of µ an Σ, an then plug-in the covariance estimate to []. Such estimates are usually infeasible when n. The ML estimate is also negatively biase so that it unerestimates the entropy on average []. We believe it is more effective an will create fewer numerical problems if one irectly estimates a single value for the entropy, rather than first estimating the entire covariance matrix in orer to prouce the scalar entropy estimate. To this en, we propose a ataepenent Bayesian estimate using an inverte Wishart prior. The choice of prior in Bayesian estimation can ramatically change the results, but there is little practical guiance for choosing priors. The main contributions of this work are showing that the inverte Wishart prior enables estimates for n, an that using a rough ata-epenent estimate as a parameter to the inverte Wishart prior yiels a more robust estimator than ignoring the ata when creating the prior. II. RELATED WORK First we review relate work in parametric entropy estimation, then in nonparametric estimation. Ahme an Gokhale investigate uniformly minimum variance unbiase UMVU entropy estimators for parametric istributions []. They showe that the UMVU entropy estimate for the Gaussian is, ĥ UMVU = ln π ln S n i, where S = n x i xx i x T, an the igamma function is efine z = z ln Γz, where Γ is the stanar gamma function. Bayesian entropy estimation for the multivariate normal was first propose in by Misra et al. []. They form an entropy estimate by substituting ln Σ for ln Σ in where ln Σ minimizes the square-error risk of the entropy estimate. That is, ln Σ solves, [ arg min E µ,σ δ ln Σ ], δ R where here Σ enotes a ranom covariance matrix an µ a ranom vector, an the expectation is taken with respect to the posterior. We will enote by µ an Σ respectively realizations of the ranom µ an Σ. Misra et al. consier ifferent priors with support over the set of symmetric positive efinite Σ. They show that given the prior p µ, Σ =, the solution to yiels the Bayesian entropy estimate ĥ Bayesian = ln π ln S Σ n i.
2 For iscrete ranom variables, a Bayesian approach that uses binning for estimating functionals such as entropy was propose the same year as Misra et al. s work []. It is interesting to note that the estimates ĥumvu an ĥ Bayesian were erive from ifferent perspectives, but iffer only slightly in the igamma argument. Like ĥumvu, Misra et al. show that ĥbayesian is an unbiase entropy estimate. They also show that ĥbayesian is ominate by a Stein-type estimator, ĥ Stein = ln S n x x T c, where c is a function of an n. Further, they also show that the estimate ĥbayesian is ominate by a Brewster-Ziek-type estimator ĥbz, ĥ BZ = ln S n x x T c. where c is a function of S an x x T that requires calculating the ratio of two efinite integrals, state in full in. of []. Misra et al. foun that on simulate numerical experiments their Stein-type an Brewster-Ziek-type estimators achieve roughly only % improvement over the simpler Bayesian estimate ĥbayesian, an thus they recommen using the computationally much simpler ĥbayesian in applications. There are two practical problems with the previously propose parametric estimators. First, the estimates given by,, an the other propose Misra et al. estimators require calculating the eterminant of S or S x x T, which can be numerically infeasible if there are few samples. Secon, the Bayesian estimate ĥbayesian uses the igamma function of n which requires n > samples so that the igamma has a non-negative argument, an similarly ĥumvu uses the igamma of n, which requires n samples. Thus, although the knowlege that one is estimating the entropy of a Gaussian shoul be of use, for the n case one must turn to nonparametric entropy estimators. A thorough review of work in nonparametric entropy estimation up to 99 was written by Beirlant et al. [], incluing ensity estimation approaches, sample-spacing approaches, an nearest-neighbor estimators. Recently, Nilsson an Kleijn show that high-rate quantization approximations of Zaor an Gray can be use to estimate Renyi entropy, an that the limiting case of Shannon entropy prouces a nearest-neighbor estimate that epens on the number of quantization cells [8]. The special case of their nearest-neighbor estimate that best valiates the high-rate quantization assumptions is when the number of quantization cells is as large as possible. They show that this special case prouces the nearest-neighbor ifferential entropy estimator originally propose by Kozachenko an Leonenko in 98 [9]: ĥ NN = n ln x i x i, lnn γ ln V n where x i, is x i s nearest neighbor in the sample set, γ is the Euler-Mascheroni constant, an V is the volume of the -imensional hypersphere with raius : V = π/ Γ/. A problem with this approach is that in practice ata samples may not be in general position; for example, image pixel ata are usually quantize to 8 bits or bits. Thus, it can happen in practice that two samples have the exact same measure value an thus x n x n, is zero an thus the entropy estimate coul be ill-efine. Though there are various fixes, such as pre-ithering the quantize ata, it is not clear what effect these fixes coul have on the estimate entropy. A ifferent approach is taken by Costa an Hero [], [], []; they use the Bearwoo Halton Hammersley result that the function of the length of a minimum spanning graph converges to the Renyi entropy [] to form an estimator base on the empirical length of a minimum spanning tree of ata. Unfortunately, how to use this approach to estimate Shannon entropy remains an open question. III. BAYESIAN ESTIMATE WITH INVERTED WISHART PRIOR We propose to estimate the entropy as, E N [hn], where N is a ranom Gaussian, an the prior pn is an inverte Wishart istribution with scale parameter q an parameter matrix B. We use a Fisher information metric to efine a measure over the Riemannian manifol forme by the set of Gaussian istributions. These choices for prior an measure are very similar to the choices that we foun worke very well for Bayesian quaratic iscriminant analysis [], an further etails on this framework can be foun in that work. The resulting propose inverte Wishart Bayesian entropy estimate is ĥ iw Bayesian = ln π ln S B n q i. Proof: To show that is E N [hn], we will nee to integrate ln Σ exp[ tr Σ V] Σ Σ q Σ> q V E Σ [ln Σ ] Γ q where Σ is a ranom covariance matrix rawn from an inverte Wishart istribution with scale parameter q an matrix parameter V. Recall that for any matrix V, Σ / V χ q i [, Corollary.], where χ enotes the chisquare ranom variable. Take the natural log of both sies an use the fact that A = A to show that ln Σ ln V ln χ q i. Then after taking E Σ[ln Σ ], 8 becomes Γ q ln V E [ ln χ ] V q q i = Γ q q i ln V, 9 V q where the secon line uses the property of the χ istribution that E[ln χ q] = ln q. 8
3 Now we will use the above integral ientity to fin ĥ iwbayesian = E N [hn]. Solving for E N [hn] only requires computing Σ E N [ln Σ ] = Σ ln Σ p Σ x, x,..., x n Σ where the term / Σ / results from the Fisher information metric which converts the integral E N [hn] from an integral over the statistical manifol of Gaussians to an integral over covariance matrices, an the posterior p Σ x, x,..., x n is given in [] such that E N [ln Σ ] S B nq = nq Γ nq ln Σ exp[ tr Σ S B] Σ = Σ> nq S B nq Γ nq ln S B Σ nq Γ nq SB nq n q i n q i = ln S B ln, where equation follows by using the fact that is given by 9 for V = S B/. Then replacing ln Σ in with the compute E N [ln Σ ] prouces the estimator. A. Choice of Prior Parameters q an B The inverte Wishart istribution is an unimoal prior that gives maximum a priori probability to Gaussians with Σ = B/q. Previous work using the inverte Wishart for Bayesian estimation of Gaussians has use the ientity matrix for B [], or a scale ientity where the scale factor was learne by cross-valiation given labele training ata for classification []. Using B = I sets the maximum of the prior at I/q, regarless of whether the ata are measure in nanometers or trillions of ollars. To a rough approximation, the prior regularizes the likelihoo towars the prior maximum at B/q, an thus the bias ae by using the prior with B/q = I can be ill-suite to the problem. Instea, setting B/q to be a rough estimate of the covariance can a bias that is more appropriate for the problem. For example, we have shown that using B = qiags can work well when estimating Gaussians for classification by Bayesian quaratic iscriminant analysis []. For entropy estimation, B = qiags/n works excellently if the true covariance is iagonal, but can perform poorly if the true covariance is a full covariance because the eterminant of B = iags/n can be significantly higher than the eterminant of S, biasing the entropy estimate to be too high. An optimal choice of B when there is no prior information about S remains an open question; we propose using B = q ln iags, n, for n, an for n > we let B be the matrix of zeros. The scalar prior parameter q changes the peake-ness of the inverte Wishart prior, with higher q corresponing to a more peake prior, an thus higher bias. For the entropy problem there is only one number to estimate, an thus we believe that as little bias as possible shoul be use. To achieve this, we set q = min n,. Letting q = when n > an using the zero matrix for B in this range makes ĥiwbayesian equivalent to ĥbayesian when n >. B. Bregman Divergences an Bayesian Estimation Misra et al. showe that E Σ [ ln Σ ] minimizes the square error loss as state in. Here we show that E N [hn] minimizes any Bregman ivergence loss. The Bregman ivergences are a class of loss functions that inclue square error an relative entropy [], []. Lemma: The mean entropy with respect to uncertainty in the generating istribution E N [hn] solves [ ] arg min E N φ hn, ĥ, ĥ R where φ a, b = φa φb φb a b for a, b, R is any Bregman ivergence with strictly convex φ. Proof: Set the first erivative to zero: = [ ĥe N φhn φĥ ] ĥ [ ĥφĥhn ] = E N φĥ hn ĥ. ĥ Because φ is strictly convex, φĥ >, an thus it must ĥ be that E N [hn ĥ] =, an thus by linearity that the minimizer is ĥ = E N [hn]. IV. EXPERIMENTS We compare the propose inverte Wishart Bayesian estimator ĥiwbayesian with the ata-epenent given in to ĥ iwbayesian with B = I, to the nearest-neighbor estimator given in, to the maximum likelihoo estimator forme by replacing µ an Σ in by maximum likelihoo estimates of µ an Σ, to ĥumvu given in, an to ĥbayesian given in. All results were compute with Matlab.. For the igamma function an Euler-Mascheroni constant we use the corresponing built-in Matlab commans. Simulations were run for a fixe imension of = with varying number of ii samples n rawn from ranom Gaussians. For n we it was not possible to calculate the igamma functions for ĥbayesian an ĥumvu or Σ, thus ĥ Bayesian, ĥumvu, an the maximum likelihoo estimates are only reporte for n >. The nearest-neighbor estimate an the inverte Wishart Bayesian estimates are compare own to n = samples. In the first simulation, each generating Gaussian ha a iagonal covariance matrix with elements rawn ii from the
4 Log of Mean Square Estimation Error ML Bayesian UMVU NN Inv. Wishart Bayesian Data epenent Inv. Wishart Bayesian # Ranomly Drawn Samples Log of Mean Square Estimation Error 9 8 # Ranomly Drawn Samples Diagonal Covariance, Elements U, ] Full Covariance R T R, Elements of R N, Log of Mean Square Estimation Error Log of Mean Square Estimation Error # Ranomly Drawn Samples # Ranomly Drawn Samples Diagonal Covariance, Elements U, ] Full Covariance R T R, Elements of R N, 9 Log of Mean Square Estimation Error 8 Log of Mean Square Estimation Error # Ranomly Drawn Samples # Ranomly Drawn Samples Diagonal Covariance, Elements U,.] Full Covariance R T R, Elements of R N,. Fig.. Comparison of entropy estimators average over, ii runs of each simulation.
5 uniform istribution on, α]. The results are shown in Fig. left for α = top, for α = mile, an for α =. bottom. Fig. shows the average eigenvalues. In the secon simulation, each generating Gaussian ha a full covariance matrix R T R, where each of the elements of R was rawn ii from a normal istribution N, α. The results are shown in Fig. right for α = top, for α = mile, an for α =. bottom. Fig. shows the average eigenvalues. For each n an each of the six cases, the simulation was run, times an the results average to prouce the plots. Average Log Eigenvalue Average Log Eigenvalue 8 Eigenvalue Rank Diagonal covariance from first simulation Eigenvalue Rank Full covariance from secon simulation Fig.. Average log ranke eigenvalues for the first simulation top an secon simulation bottom. For n > the three Bayesian estimates are equivalent an perform consistently better than ĥumv U, the maximum likelihoo estimate, or the nearest-neighbor estimate. The maximum likelihoo estimator is always the worst parametric estimator. Throughout the simulations, the nearest-neighbor estimate makes the least use of aitional samples, improving its estimate only slowly. This is reasonable because the nearestneighbor estimator oes not explicitly use the information that the true istribution is Gaussian. Given very few samples the nearest-neighbor estimator is the best performer for two of the full covariance cases. This suggests that ifferent prior parameter settings coul be more effective when there are few samples, perhaps a prior that as more bias. As expecte, in general the ata-epenent ĥ iwbayesian achieves lower error than ĥiwbayesian with B = I, sometimes significantly better, as in the case of true iagonal covariance with elements rawn from U,.], shown in Fig. bottom. V. CONCLUSIONS A ata-epenent approach to Bayesian entropy estimation was given that minimizes expecte Bregman ivergence an performs consistently well compare to other possible estimators for the high-imensional/limite ata case that n. REFERENCES [] T. Cover an J. Thomas, Elements of Information Theory. Unite States of America: John Wiley an Sons, 99. [] A. Hero an O. Michel, Asymptotic theory of greey approximations to minimal k-point ranom graphs, IEEE Trans. Information Theory, vol., pp. 9 99, 999. [] J. Costa an A. Hero, Geoesic entropic graphs for imension an entropy estimation in manifol learning, IEEE Trans. Signal Processing, vol., no. 8, pp.,. [] J. Beirlant, E. Duewicz, L. Györfi, an E. van er Meulen, Nonparametric entropy estimation: An overview, International Journal Mathematical an Statistical Sciences, vol., pp. 9, 98. [] N. Misra, H. Singh, an E. Demchuk, Estimation of the entropy of a multivariate normal istribution, Journal of Multivariate Analysis, vol. 9, pp.,. [] N. A. Ahme an D. V. Gokhale, Entropy expressions an their estimators for multivariate istributions, IEEE Trans. Information Theory, pp. 88 9, 989. [] D. Enres an P. Föliàk, Bayesian bin istribution inference an mutual information, IEEE Trans. Information Theory, pp. 9,. [8] M. Nilsson an W. B. Kleijn, On the estimation of ifferential entropy from ata locate on embee manifols, IEEE Trans. on Information Theory, vol., no., pp.,. [9] L. F. Kozachenko an N. N. Leonenko, Sample estimate of entropy of a ranom vector, Problems in Information Transmission, vol., no., pp. 9, 98. [] A. Hero, B. Ma, O. Michel, an J. Gorman, Applications of entropic spanning graphs, IEEE Signal Processing Magazine, pp. 8 9,. [] J. Bearwoo, J. H. Halton, an J. M. Hammersley, The shortest path through many points, Proc. Cambrige Philo. Soc., vol., pp. 99, 99. [] S. Srivastava, M. R. Gupta, an B. A. Frigyik, Bayesian quaratic iscriminant analysis, Journal of Machine Learning Research, vol. 8, pp. 8,. [] M. Biloeau an D. Brenner, Theory of Multivariate Statistics. New York: Springer Texts in Statistics, 999. [] D. G. Keehn, A note on learning for Gaussian properties, IEEE Trans. on Information Theory, vol., pp., 9. [] P. J. Brown, T. Fearn, an M. S. Haque, Discrimination with many variables, Journal of the American Statistical Association, vol. 9, no. 8, pp. 9, 999. [] A. Banerjee, X. Guo, an H. Wang, On the optimality of conitional expectation as a Bregman preictor, IEEE Trans. on Information Theory, vol., no., pp. 9,. [] B. A. Frigyik, S. Srivastava, an M. R. Gupta, Functional Bregman ivergence an Bayesian estimation of istributions, arxiv preprint cs.it/,.
Influence of weight initialization on multilayer perceptron performance
Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -
More informationA PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks
A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,
More informationParameter estimation: A new approach to weighting a priori information
Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a
More informationRobust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k
A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine
More informationLinear First-Order Equations
5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)
More informationMINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava
MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.
More informationComputing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions
Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5
More informationRelative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation
Relative Entropy an Score Function: New Information Estimation Relationships through Arbitrary Aitive Perturbation Dongning Guo Department of Electrical Engineering & Computer Science Northwestern University
More informationTutorial on Maximum Likelyhood Estimation: Parametric Density Estimation
Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing
More informationSurvey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013
Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing
More informationDiagonalization of Matrices Dr. E. Jacobs
Diagonalization of Matrices Dr. E. Jacobs One of the very interesting lessons in this course is how certain algebraic techniques can be use to solve ifferential equations. The purpose of these notes is
More informationu!i = a T u = 0. Then S satisfies
Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace
More informationProblem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs
Problem Sheet 2: Eigenvalues an eigenvectors an their use in solving linear ODEs If you fin any typos/errors in this problem sheet please email jk28@icacuk The material in this problem sheet is not examinable
More informationSpring 2016 Network Science
Spring 206 Network Science Sample Problems for Quiz I Problem [The Application of An one-imensional Poisson Process] Suppose that the number of typographical errors in a new text is Poisson istribute with
More informationarxiv: v4 [math.pr] 27 Jul 2016
The Asymptotic Distribution of the Determinant of a Ranom Correlation Matrix arxiv:309768v4 mathpr] 7 Jul 06 AM Hanea a, & GF Nane b a Centre of xcellence for Biosecurity Risk Analysis, University of Melbourne,
More informationLecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012
CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration
More informationLinear and quadratic approximation
Linear an quaratic approximation November 11, 2013 Definition: Suppose f is a function that is ifferentiable on an interval I containing the point a. The linear approximation to f at a is the linear function
More informationEuler equations for multiple integrals
Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................
More informationThis module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics
This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.
More informationCalculus of Variations
16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t
More information19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control
19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior
More informationLower bounds on Locality Sensitive Hashing
Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,
More informationSturm-Liouville Theory
LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory
More informationConvergence of Random Walks
Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of
More informationSYNCHRONOUS SEQUENTIAL CIRCUITS
CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents
More informationLecture 6: Calculus. In Song Kim. September 7, 2011
Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear
More informationEntanglement is not very useful for estimating multiple phases
PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The
More informationA. Exclusive KL View of the MLE
A. Exclusive KL View of the MLE Lets assume a change-of-variable moel p Z z on the ranom variable Z R m, such as the one use in Dinh et al. 2017: z 0 p 0 z 0 an z = ψz 0, where ψ is an invertible function
More informationLower Bounds for the Smoothed Number of Pareto optimal Solutions
Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.
More informationTable of Common Derivatives By David Abraham
Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec
More informationA new proof of the sharpness of the phase transition for Bernoulli percolation on Z d
A new proof of the sharpness of the phase transition for Bernoulli percolation on Z Hugo Duminil-Copin an Vincent Tassion October 8, 205 Abstract We provie a new proof of the sharpness of the phase transition
More informationExam 2 Review Solutions
Exam Review Solutions 1. True or False, an explain: (a) There exists a function f with continuous secon partial erivatives such that f x (x, y) = x + y f y = x y False. If the function has continuous secon
More informationThe Press-Schechter mass function
The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for
More informationLecture 2: Correlated Topic Model
Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables
More informationA note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz
A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae
More informationTopic 7: Convergence of Random Variables
Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information
More informationCascaded redundancy reduction
Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,
More informationLeast-Squares Regression on Sparse Spaces
Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction
More informationOptimal Variable-Structure Control Tracking of Spacecraft Maneuvers
Optimal Variable-Structure Control racking of Spacecraft Maneuvers John L. Crassiis 1 Srinivas R. Vaali F. Lanis Markley 3 Introuction In recent years, much effort has been evote to the close-loop esign
More informationFunction Spaces. 1 Hilbert Spaces
Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure
More informationSome Examples. Uniform motion. Poisson processes on the real line
Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]
More informationPDE Notes, Lecture #11
PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =
More informationGeneralizing Kronecker Graphs in order to Model Searchable Networks
Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu
More informationQuantum Mechanics in Three Dimensions
Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.
More information3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes
Fin these erivatives of these functions: y.7 Implicit Differentiation -- A Brief Introuction -- Stuent Notes tan y sin tan = sin y e = e = Write the inverses of these functions: y tan y sin How woul we
More informationRobust Low Rank Kernel Embeddings of Multivariate Distributions
Robust Low Rank Kernel Embeings of Multivariate Distributions Le Song, Bo Dai College of Computing, Georgia Institute of Technology lsong@cc.gatech.eu, boai@gatech.eu Abstract Kernel embeing of istributions
More information05 The Continuum Limit and the Wave Equation
Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,
More information1 Lecture 20: Implicit differentiation
Lecture 20: Implicit ifferentiation. Outline The technique of implicit ifferentiation Tangent lines to a circle Derivatives of inverse functions by implicit ifferentiation Examples.2 Implicit ifferentiation
More informationImplicit Differentiation
Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,
More informationLeaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes
Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,
More informationGaussian processes with monotonicity information
Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process
More informationIntroduction to Machine Learning
How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression
More informationMulti-View Clustering via Canonical Correlation Analysis
Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in
More information6 General properties of an autonomous system of two first order ODE
6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x
More informationTime-of-Arrival Estimation in Non-Line-Of-Sight Environments
2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor
More informationOn the Value of Partial Information for Learning from Examples
JOURNAL OF COMPLEXITY 13, 509 544 (1998) ARTICLE NO. CM970459 On the Value of Partial Information for Learning from Examples Joel Ratsaby* Department of Electrical Engineering, Technion, Haifa, 32000 Israel
More informationLeast Distortion of Fixed-Rate Vector Quantizers. High-Resolution Analysis of. Best Inertial Profile. Zador's Formula Z-1 Z-2
High-Resolution Analysis of Least Distortion of Fixe-Rate Vector Quantizers Begin with Bennett's Integral D 1 M 2/k Fin best inertial profile Zaor's Formula m(x) λ 2/k (x) f X(x) x Fin best point ensity
More informationMath Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors
Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+
More informationBalancing Expected and Worst-Case Utility in Contracting Models with Asymmetric Information and Pooling
Balancing Expecte an Worst-Case Utility in Contracting Moels with Asymmetric Information an Pooling R.B.O. erkkamp & W. van en Heuvel & A.P.M. Wagelmans Econometric Institute Report EI2018-01 9th January
More informationSTATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING
STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski
More informationLevel Construction of Decision Trees in a Partition-based Framework for Classification
Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S
More informationHomework 2 EM, Mixture Models, PCA, Dualitys
Homework 2 EM, Mixture Moels, PCA, Dualitys CMU 10-715: Machine Learning (Fall 2015) http://www.cs.cmu.eu/~bapoczos/classes/ml10715_2015fall/ OUT: Oct 5, 2015 DUE: Oct 19, 2015, 10:20 AM Guielines The
More informationBinary Discrimination Methods for High Dimensional Data with a. Geometric Representation
Binary Discrimination Methos for High Dimensional Data with a Geometric Representation Ay Bolivar-Cime, Luis Miguel Corova-Roriguez Universia Juárez Autónoma e Tabasco, División Acaémica e Ciencias Básicas
More informationAn Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an
More informationThermal conductivity of graded composites: Numerical simulations and an effective medium approximation
JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University
More informationThe Principle of Least Action
Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of
More informationProof of Proposition 1
A Proofs of Propositions,2,. Before e look at the MMD calculations in various cases, e prove the folloing useful characterization of MMD for translation invariant kernels like the Gaussian an Laplace kernels.
More informationStable and compact finite difference schemes
Center for Turbulence Research Annual Research Briefs 2006 2 Stable an compact finite ifference schemes By K. Mattsson, M. Svär AND M. Shoeybi. Motivation an objectives Compact secon erivatives have long
More informationJUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson
JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises
More informationSYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is
SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. Uniqueness for solutions of ifferential equations. We consier the system of ifferential equations given by x = v( x), () t with a given initial conition
More informationLecture 10: October 30, 2017
Information an Coing Theory Autumn 2017 Lecturer: Mahur Tulsiani Lecture 10: October 30, 2017 1 I-Projections an applications In this lecture, we will talk more about fining the istribution in a set Π
More informationEquilibrium in Queues Under Unknown Service Times and Service Value
University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University
More informationKNN Particle Filters for Dynamic Hybrid Bayesian Networks
KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030
More informationA Review of Multiple Try MCMC algorithms for Signal Processing
A Review of Multiple Try MCMC algorithms for Signal Processing Luca Martino Image Processing Lab., Universitat e València (Spain) Universia Carlos III e Mari, Leganes (Spain) Abstract Many applications
More informationAdaptive Local Linear Regression with Application. to Printer Color Management
Aaptive Local Linear Regression with Application 1 to Printer Color Management Maya R. Gupta, Member, IEEE, Eric K. Garcia, Stuent Member, IEEE, an Erika Chin Abstract Local learning methos, such as local
More informationMathematical Review Problems
Fall 6 Louis Scuiero Mathematical Review Problems I. Polynomial Equations an Graphs (Barrante--Chap. ). First egree equation an graph y f() x mx b where m is the slope of the line an b is the line's intercept
More informationCapacity Analysis of MIMO Systems with Unknown Channel State Information
Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,
More information4. Important theorems in quantum mechanics
TFY4215 Kjemisk fysikk og kvantemekanikk - Tillegg 4 1 TILLEGG 4 4. Important theorems in quantum mechanics Before attacking three-imensional potentials in the next chapter, we shall in chapter 4 of this
More informationSimilarity Measures for Categorical Data A Comparative Study. Technical Report
Similarity Measures for Categorical Data A Comparative Stuy Technical Report Department of Computer Science an Engineering University of Minnesota 4-92 EECS Builing 200 Union Street SE Minneapolis, MN
More informationOptimal CDMA Signatures: A Finite-Step Approach
Optimal CDMA Signatures: A Finite-Step Approach Joel A. Tropp Inst. for Comp. Engr. an Sci. (ICES) 1 University Station C000 Austin, TX 7871 jtropp@ices.utexas.eu Inerjit. S. Dhillon Dept. of Comp. Sci.
More informationLectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs
Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent
More informationCS9840 Learning and Computer Vision Prof. Olga Veksler. Lecture 2. Some Concepts from Computer Vision Curse of Dimensionality PCA
CS9840 Learning an Computer Vision Prof. Olga Veksler Lecture Some Concepts from Computer Vision Curse of Dimensionality PCA Some Slies are from Cornelia, Fermüller, Mubarak Shah, Gary Braski, Sebastian
More informationEIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION
EIGEN-ANALYSIS OF KERNEL OPERATORS FOR NONLINEAR DIMENSION REDUCTION AND DISCRIMINATION DISSERTATION Presente in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Grauate
More informationOn conditional moments of high-dimensional random vectors given lower-dimensional projections
Submitte to the Bernoulli arxiv:1405.2183v2 [math.st] 6 Sep 2016 On conitional moments of high-imensional ranom vectors given lower-imensional projections LUKAS STEINBERGER an HANNES LEEB Department of
More informationinflow outflow Part I. Regular tasks for MAE598/494 Task 1
MAE 494/598, Fall 2016 Project #1 (Regular tasks = 20 points) Har copy of report is ue at the start of class on the ue ate. The rules on collaboration will be release separately. Please always follow the
More informationA Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential
Avances in Applie Mathematics an Mechanics Av. Appl. Math. Mech. Vol. 1 No. 4 pp. 573-580 DOI: 10.4208/aamm.09-m0946 August 2009 A Note on Exact Solutions to Linear Differential Equations by the Matrix
More informationProblem set 2: Solutions Math 207B, Winter 2016
Problem set : Solutions Math 07B, Winter 016 1. A particle of mass m with position x(t) at time t has potential energy V ( x) an kinetic energy T = 1 m x t. The action of the particle over times t t 1
More informationCHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold
CHAPTER 1 : DIFFERENTIABLE MANIFOLDS 1.1 The efinition of a ifferentiable manifol Let M be a topological space. This means that we have a family Ω of open sets efine on M. These satisfy (1), M Ω (2) the
More informationDelay Limited Capacity of Ad hoc Networks: Asymptotically Optimal Transmission and Relaying Strategy
Delay Limite Capacity of A hoc Networks: Asymptotically Optimal Transmission an Relaying Strategy Eugene Perevalov Lehigh University Bethlehem, PA 85 Email: eup2@lehigh.eu Rick Blum Lehigh University Bethlehem,
More informationExpected Value of Partial Perfect Information
Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo
More informationError Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation
Error Floors in LDPC Coes: Fast Simulation, Bouns an Harware Emulation Pamela Lee, Lara Dolecek, Zhengya Zhang, Venkat Anantharam, Borivoje Nikolic, an Martin J. Wainwright EECS Department University of
More informationImproving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers
International Journal of Statistics an Probability; Vol 6, No 5; September 207 ISSN 927-7032 E-ISSN 927-7040 Publishe by Canaian Center of Science an Eucation Improving Estimation Accuracy in Nonranomize
More informationText S1: Simulation models and detailed method for early warning signal calculation
1 Text S1: Simulation moels an etaile metho for early warning signal calculation Steven J. Lae, Thilo Gross Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 01187 Dresen, Germany
More informationSchrödinger s equation.
Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of
More informationA Modification of the Jarque-Bera Test. for Normality
Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam
More informationIntroduction to Markov Processes
Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav
More informationA New Minimum Description Length
A New Minimum Description Length Soosan Beheshti, Munther A. Dahleh Laboratory for Information an Decision Systems Massachusetts Institute of Technology soosan@mit.eu,ahleh@lis.mit.eu Abstract The minimum
More informationWUCHEN LI AND STANLEY OSHER
CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability
More informationThe derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)
Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)
More information