EFFICIENT MULTIVARIATE ENTROPY ESTIMATION WITH

Size: px
Start display at page:

Download "EFFICIENT MULTIVARIATE ENTROPY ESTIMATION WITH"

Transcription

1 EFFICIENT MULTIVARIATE ENTROPY ESTIMATION WITH HINTS OF APPLICATIONS TO TESTING SHAPE CONSTRAINTS Richard Samworth, University of Cambridge Joint work with Thomas B. Berrett and Ming Yuan

2 Collaborators October 1,

3 Differential entropy The (differential) entropy of a random vector X with density function f is defined as H = H(X) = H(f) := E{logf(X)} = X f logf where X := {x : f(x) > 0}. Estimation of entropy plays a key role in goodness-of-fit tests (Vasicek, 1976; Cressie, 1976), tests of independence (Goria et al., 2005), independent component analysis (Miller and Fisher, 2003), feature selection in classification (Kwak and Choi, 2002)... October 1,

4 Kozachenko Leonenko estimators Let X 1,...,X n iid f on R d. Let X (k),i denote the kth nearest neighbour of X i, and let ρ (k),i := X (k),i X i. The Kozachenko Leonenko estimator of the entropy H is Ĥ n = Ĥn(X 1,...,X n ) := 1 n n i=1 ( ρ d (k),i V d (n 1) ) log e Ψ(k), where V d := π d/2 /Γ(1+d/2) denotes the volume of the unit d-dimensional Euclidean ball and where Ψ denotes the digamma function. October 1,

5 Intuition KL estimators attempt to mimic oracle estimator H n := n 1 n i=1 logf(x i) based on a k-nearest neighbour density estimate approximation k n 1 V dρ d (k),1 f(x 1). Previous work focuses on k = 1 or (recently) k fixed, and often assumes f is compactly supported (Kozachenko and Leonenko, 1987; Tsybakov and Van der Meulen, 1996; Singh et al., 2003; Mnatsakanov et al., 2008; Biau and Devroye, 2015; Delattre and Fournier, 2016; Singh and Póczos, 2016; Gao et al., 2016). October 1,

6 The trouble with full support A Taylor expansion of H(f) around a density estimator ˆf yields H(f) f(x)log ˆf(x)dx 1 ( f 2 ) (x) R d 2 R ˆf(x) dx 1. d When f is bounded away from zero on its support, one can estimate the (smaller order) second term to obtain efficient estimators in higher dimensions (Laurent, 1996). However, when f is not bounded away from zero on its support such procedures are no longer effective. October 1,

7 Intuition regarding bias Let ξ i := ρd (k),i V d(n 1), and for u [0, ), define e Ψ(k) F n,x (u) := P(ξ i u X i = x) = n 1 j=k ( n 1 j ) p j n,x,u(1 p n,x,u ) n 1 j, where p n,x,u := B x (r n,u ) f(y)dy and r n,u := { e Ψ(k) u V d (n 1)} 1/d. Also define a limiting distribution function where λ x,u := uf(x)e Ψ(k). F x (u) := e λ x,u j=k λ j x,u j!, October 1,

8 More intuition regarding bias We expect that E(Ĥ n ) = f(x) X f(x) = X X f(x) logudf n,x (u)dx logudf x (u)dx ( te Ψ(k) log )e t t k 1 f(x) (k 1)! dtdx = H, where we have substituted t = λ x,u. October 1,

9 Formal conditions (A1)(α) R x α f(x)dx < d (A2)(β) f is bounded and, with m := β, η := β m, f is C m (R d ). r > 0 and a measurable g : R d [0, ) such that for t = 1,...,m and y x r, f (t) (x) g (x)f(x) f (m) (y) f (m) (x) g (x)f(x) y x η and sup x:f(x) δ g (x) = o(δ ǫ ) as δ ց 0 for every ǫ > 0. October 1,

10 Main result on the bias Suppose (A1)(α) and (A2)(β) hold for some α,β > 0. Let k = k n be a deterministic sequence of positive integers with k = O(n 1 ǫ ) as n for some ǫ > 0. (i) If d 3, α > 2d d 2, β > 2, then uniformly for k {1,...,k }, E(Ĥn) H = Γ(k +2/d) 2V 2/d d (d+2)γ(k)n 2/d X ( ) f(x) k 2/d dx+o f(x) 2/d n 2/d. (ii) If d 2 or d 3 with α (0,2d/(d 2)] or β 2, then ( { (k ) α/(α+d) ǫ, k β/d }) E(Ĥn) H = O max n uniformly for k {1,...,k }, for every ǫ > 0. n β/d October 1,

11 Corollary Assume the conditions in (i), and let k0 = k 0,n be a deterministic sequence of positive integers with k0,n k n and k0 as n. Then E(Ĥ n ) H = 1 2V 2/d d (d+2) { X } ( ) f(x) k 2/d k 2/d dx f(x) 2/d n 2/d+o n 2/d uniformly for k {k 0,...,k }. October 1,

12 Prewhitening For any A R d d with A > 0, H(AX 1 ) = H(X 1 )+log A. We can therefore set Y i := AX i, and let Ĥ A n = ĤA n(x 1,...,X n ) := Ĥn(Y 1,...,Y n ) log A. Under the conditions in (i), the dominant bias contribution is Γ(k +2/d) 2V 2/d d (d+2)γ(k)n 2/d X tr(a T f(x)a) A 2/d dx. f(x) 2/d In general, it is hard to compare this with the original bias, but... October 1,

13 More on prewhitening Suppose f(x) = Σ 1/2 f 0 (Σ 1/2 x), where either f 0 (z) = d j=1 g(z j) for some density g, or f 0 (z) = g( z p ) for some p (0, ), g : [0, ) [0, ). Then X X f(x) f(x) 2/d dx = Σ 1/d tr(σ 1 ) d f 0 (x) R d f 0 (x) 2/d dx, tr(a T f(x)a) A 2/d f(x) 2/d dx = AΣAT 1/d tr(a T Σ 1 A 1 ) d f 0 (x) dx. R d f 0 (x) 2/d By the AM GM inequality, A AΣA T 1/d tr(a T Σ 1 A 1 ) is minimised when the eigenvalues of AΣA T are equal. This motivates the choice A = ˆΣ 1/2. October 1,

14 Examples MSE with n = from N 3 (0,Σ) with Σ := Σ := MSE Standard Prewhitened MSE Standard Prewhitened k k October 1,

15 Asymptotic variance Assume (A1)(α) for some α > d and that (A2)(2) holds. Let k0 and k 1 denote deterministic sequences with k0 k 1, k 0 /log5 n and k1 = O(nτ ), where { 2α τ < min 5α+3d, α d } 2α, d Then VarĤn = n 1 Varlogf(X 1 )+o(n 1 ) as n, uniformly for k {k 0,...,k 1 }. October 1,

16 Efficiency in low dimensions Assume d 3 and the previous conditions hold. If d = 3, assume also that k 1 = o(n1/4 ). Then n 1/2 (Ĥn H) d N ( 0,Varlogf(X 1 ) ), and ne { (Ĥ n H) 2} Varlogf(X 1 ) as n. October 1,

17 Weighted KL estimators For weights w 1,...,w k with k j=1 w j = 1, define Ĥ w n := 1 n n i=1 k w j logξ (j),i, j=1 where ξ (j),i := e Ψ(j) V d (n 1)ρ d (j),i. If k j=1 w j Γ(j +2/d) Γ(j) = 0, then under (A1)(α) with α > d and (A2)(β) with β > 2, E(Ĥw n) H = o(n 1/2 ) when d = 4 and if β > 5/2, then the same conclusion holds when d = 5. October 1,

18 Choosing weights in the general case Let W (k) := { w R k : k j=1 w j Γ(j +2l/d) Γ(j) = 0,l = 1,..., d/4, k } w j = 1,w j = 0 for j / { k/d, 2k/d,...,k}. j=1 Then there exists k d such that for k k d, we can find w (k) W (k) with sup k kd w (k) <. October 1,

19 Efficiency in arbitrary dimensions Assume (A1)(α) and (A2)(β), k = k n is as before, and w = w (k) W (k) for large n with limsup n w (k) <. (i) If α > d, β > d/2, k1 = o( min { d/4 1 n 1+ d/4, n 1 2β}) d, then n 1/2 (Ĥ w n H) d N ( 0,Varlogf(X 1 ) ) ne{(ĥw n H) 2 } Varlogf(X 1 ), uniformly for k {k 0,...,k 1 }. (ii) If d 4, α > d and β [2,d/2], then Ĥ w n H = O p (k β/d /n β/d ). uniformly for k {k 0,...,k 1 }. October 1,

20 Log-concave projections Let P d be the set of probability measures P on R d with R d x dp(x) < and P(H) < 1 for all hyperplanes H. Let F d be the set of all upper semi-continuous log-concave densities on R d. Then there is a well-defined projection ψ : P d F d given by ψ (P) := argmax logf dp f F d R d (Dümbgen, S. and Schuhmacher, 2011). October 1,

21 Risk bounds Let X 1,...,X n iid f0 F d with empirical distribution P n, and let ˆf n := ψ (P n ). Let d 2 KL (f,g) := R d f log(f/g). Then d 2 KL(ˆf n,f 0 ) 1 n n i=1 log ˆf n (X i ) f 0 (X i ) =: d2 X(ˆf n,f 0 ) (Dümbgen, S. and Schuhmacher, 2011, Kim, Guntuboyina and S., 2016). Moreover, sup E f0 d 2 X(ˆf n,f 0 ) = f 0 F d O(n 4/5 ) if d = 1 O(n 2/3 logn) if d = 2 O(n 1/2 logn) if d = 3 (Kim and S., 2016, Kim, Guntuboyina and S., 2016). October 1,

22 Estimating entropy under log-concavity Let Then, for d = 1,2, Ĥ LC n := 1 n n log ˆf n (X i ). i=1 1/2 n (ĤLC n H) = n R 1/2 logf 0 d(p n P 0 ) n 1/2 d 2 X(ˆf n,f 0 ) d d N ( 0,Varlogf 0 (X 1 ) ). October 1,

23 Testing log-concavity Let X P P d and let X ψ (P). Then H(X ) H(X), with equality iff X has a log-concave density. To test H 0 : f 0 F d, set T n := Ĥn ĤLC n. Under H 0, for d = 1,2, T n = o p (n 1/2 ), provided f 0 satisfies conditions for Ĥn to be efficient. Under H 1, T n p H(f0 ) H(f 0) < 0. October 1,

24 References Berrett, T. B., Samworth, R. J. and Yuan, M. (2016) Efficient multivariate entropy estimation via k-nearest neighbour distances. Biau, G. and Devroye, L. (2015) Lectures on the Nearest Neighbor Method. Springer, New York. Cressie, N. (1976) On the logarithms of high-order spacings. Biometrika, 63, Delattre, S. and Fournier, N. (2016) On the KozachenkoLeonenko entropy estimator. Dümbgen, L., Samworth, R. and Schuhmacher, D. (2011) Approximation by log-concave distributions with applications to regression. Ann. Statist., 39, Gao, W., Oh, S. and Viswanath, P. (2016) Demystifying fixed k-nearest neighbor information estimators. October 1,

25 Goria, M. N., Leonenko, N. N., Mergel, V. V. and Novi Inverardi, P. L. (2005). A new class of random vector entropy estimators and its applications in testing statistical hypotheses. J. Nonparam. Statist., 17, Kim, A. K. H. and Samworth, R. J. (2016) Global rates of convergence in log-concave density estimation. Ann. Statist., to appear. Kozachenko, L. F. and Leonenko, N. N. (1987). Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii, 23, Kwak, N. and Choi, C. (2002). Input feature selection by mutual information based on Parzen window. IEEE Trans. Patt. Anal. and Mach. Intell., 24, Laurent, B. (1996) Efficient estimation of integral functionals of a density. Ann. Statist., 24, Miller, E. G. and Fisher, J. W. (2003). ICA using spacings estimates of entropy. J. Mach. Learn. Res., 4, October 1,

26 Mnatsakanov, R. M., Misra, N., Li, S. and Harner, E. J. (2008). Kn-nearest neighbor estimators of entropy. Math. Meth. Statist., 17, Singh, H., Misra, N., Hnizdo, V., Fedorowicz, A. and Demchuk, E. (2003). Nearest neighbor estimates of entropy. Amer. J. Math. Man. Sci., 23, Singh, S. and Póczos, B. (2016) Analysis of k nearest neighbor distances with application to entropy estimation. Tsybakov, A. B. and Van der Meulen, E. C. (1996). Root-n consistent estimators of entropy for densities with unbounded support. Scand. J. Statist., 23, Vasicek, O. (1976). A test for normality based on sample entropy. J. Roy. Statist. Soc. Ser B, 38, October 1,

ESTIMATION OF ENTROPIES AND DIVERGENCES VIA NEAREST NEIGHBORS. 1. Introduction

ESTIMATION OF ENTROPIES AND DIVERGENCES VIA NEAREST NEIGHBORS. 1. Introduction Tatra Mt. Math. Publ. 39 (008), 65 73 t m Mathematical Publications ESTIMATION OF ENTROPIES AND DIVERGENCES VIA NEAREST NEIGHBORS Nikolai Leonenko Luc Pronzato Vippal Savani ABSTRACT. We extend the results

More information

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan OPTIMISATION CHALLENGES IN MODERN STATISTICS Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan How do optimisation problems arise in Statistics? Let X 1,...,X n be independent and identically distributed

More information

INDEPENDENT COMPONENT ANALYSIS VIA

INDEPENDENT COMPONENT ANALYSIS VIA INDEPENDENT COMPONENT ANALYSIS VIA NONPARAMETRIC MAXIMUM LIKELIHOOD ESTIMATION Truth Rotate S 2 2 1 0 1 2 3 4 X 2 2 1 0 1 2 3 4 4 2 0 2 4 6 4 2 0 2 4 6 S 1 X 1 Reconstructe S^ 2 2 1 0 1 2 3 4 Marginal

More information

Nonparametric estimation of log-concave densities

Nonparametric estimation of log-concave densities Nonparametric estimation of log-concave densities Jon A. Wellner University of Washington, Seattle Seminaire, Institut de Mathématiques de Toulouse 5 March 2012 Seminaire, Toulouse Based on joint work

More information

ENTROPY-BASED GOODNESS OF FIT TEST FOR A COMPOSITE HYPOTHESIS

ENTROPY-BASED GOODNESS OF FIT TEST FOR A COMPOSITE HYPOTHESIS Bull. Korean Math. Soc. 53 (2016), No. 2, pp. 351 363 http://dx.doi.org/10.4134/bkms.2016.53.2.351 ENTROPY-BASED GOODNESS OF FIT TEST FOR A COMPOSITE HYPOTHESIS Sangyeol Lee Abstract. In this paper, we

More information

Goodness of Fit Test and Test of Independence by Entropy

Goodness of Fit Test and Test of Independence by Entropy Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi

More information

Homework 1 Due: Wednesday, September 28, 2016

Homework 1 Due: Wednesday, September 28, 2016 0-704 Information Processing and Learning Fall 06 Homework Due: Wednesday, September 8, 06 Notes: For positive integers k, [k] := {,..., k} denotes te set of te first k positive integers. Wen p and Y q

More information

Nonparametric estimation of log-concave densities

Nonparametric estimation of log-concave densities Nonparametric estimation of log-concave densities Jon A. Wellner University of Washington, Seattle Northwestern University November 5, 2010 Conference on Shape Restrictions in Non- and Semi-Parametric

More information

Nonparametric estimation under Shape Restrictions

Nonparametric estimation under Shape Restrictions Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions

More information

ICA based on a Smooth Estimation of the Differential Entropy

ICA based on a Smooth Estimation of the Differential Entropy ICA based on a Smooth Estimation of the Differential Entropy Lev Faivishevsky School of Engineering, Bar-Ilan University levtemp@gmail.com Jacob Goldberger School of Engineering, Bar-Ilan University goldbej@eng.biu.ac.il

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

A Law of the Iterated Logarithm. for Grenander s Estimator

A Law of the Iterated Logarithm. for Grenander s Estimator A Law of the Iterated Logarithm for Grenander s Estimator Jon A. Wellner University of Washington, Seattle Probability Seminar October 24, 2016 Based on joint work with: Lutz Dümbgen and Malcolm Wolff

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

Hands-On Learning Theory Fall 2016, Lecture 3

Hands-On Learning Theory Fall 2016, Lecture 3 Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete

More information

VARIABLE SELECTION AND INDEPENDENT COMPONENT

VARIABLE SELECTION AND INDEPENDENT COMPONENT VARIABLE SELECTION AND INDEPENDENT COMPONENT ANALYSIS, PLUS TWO ADVERTS Richard Samworth University of Cambridge Joint work with Rajen Shah and Ming Yuan My core research interests A broad range of methodological

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Lecture 17: Differential Entropy

Lecture 17: Differential Entropy Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Supervised Learning: Non-parametric Estimation

Supervised Learning: Non-parametric Estimation Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates:

More information

Bipartite k-nearest neighbor graphs, entropy, and learning

Bipartite k-nearest neighbor graphs, entropy, and learning Bipartite k-nearest neighbor graphs, entropy, and learning Alfred Hero University of Michigan - Ann Arbor June 14, 2012 1 / 71 1 Motivation 2 knn constructs 3 Bipartite knn 4 Ensemble weighted BP-kNN estimators

More information

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Nonparametric estimation: s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Statistics Seminar, York October 15, 2015 Statistics Seminar,

More information

NORM DEPENDENCE OF THE COEFFICIENT MAP ON THE WINDOW SIZE 1

NORM DEPENDENCE OF THE COEFFICIENT MAP ON THE WINDOW SIZE 1 NORM DEPENDENCE OF THE COEFFICIENT MAP ON THE WINDOW SIZE Thomas I. Seidman and M. Seetharama Gowda Department of Mathematics and Statistics University of Maryland Baltimore County Baltimore, MD 8, U.S.A

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

k-nearest neighbor estimation of entropies with confidence

k-nearest neighbor estimation of entropies with confidence k-nearest neighbor estimation of entropies with confidence Kumar Sricharan, Raviv Raich +, Alfred O. Hero III Department of EECS, University of Michigan, Ann Arbor 4809-222 + School of EECS, Oregon State

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Estimation of a quadratic regression functional using the sinc kernel

Estimation of a quadratic regression functional using the sinc kernel Estimation of a quadratic regression functional using the sinc kernel Nicolai Bissantz Hajo Holzmann Institute for Mathematical Stochastics, Georg-August-University Göttingen, Maschmühlenweg 8 10, D-37073

More information

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood

Nonparametric estimation of. s concave and log-concave densities: alternatives to maximum likelihood Nonparametric estimation of s concave and log-concave densities: alternatives to maximum likelihood Jon A. Wellner University of Washington, Seattle Cowles Foundation Seminar, Yale University November

More information

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

Curve learning. p.1/35

Curve learning. p.1/35 Curve learning Gérard Biau UNIVERSITÉ MONTPELLIER II p.1/35 Summary The problem The mathematical model Functional classification 1. Fourier filtering 2. Wavelet filtering Applications p.2/35 The problem

More information

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

Minimax Estimation of a nonlinear functional on a structured high-dimensional model Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics

More information

On variable bandwidth kernel density estimation

On variable bandwidth kernel density estimation JSM 04 - Section on Nonparametric Statistics On variable bandwidth kernel density estimation Janet Nakarmi Hailin Sang Abstract In this paper we study the ideal variable bandwidth kernel estimator introduced

More information

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 B Technical details B.1 Variance of ˆf dec in the ersmooth case We derive the variance in the case where ϕ K (t) = ( 1 t 2) κ I( 1 t 1), i.e. we take K as

More information

Geodesic Convexity and Regularized Scatter Estimation

Geodesic Convexity and Regularized Scatter Estimation Geodesic Convexity and Regularized Scatter Estimation Lutz Duembgen (Bern) David Tyler (Rutgers) Klaus Nordhausen (Turku/Vienna), Heike Schuhmacher (Bern) Markus Pauly (Ulm), Thomas Schweizer (Bern) Düsseldorf,

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

Laws of large numbers and nearest neighbor distances

Laws of large numbers and nearest neighbor distances arxiv:911.331v1 [math.pr] 2 Nov 29 Laws of large numbers and nearest neighbor distances Mathew D. Penrose and J. E. Yukich November 2, 29 Dedicated to Sreenivasa Rao Jammalamadaka to mark his 65th year

More information

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Statistica Sinica 26 (2016), 1117-1128 doi:http://dx.doi.org/10.5705/ss.202015.0240 A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Xu He and Peter Z. G. Qian Chinese Academy of Sciences

More information

Hyperspacings and the Estimation of Information Theoretic Quantities

Hyperspacings and the Estimation of Information Theoretic Quantities Hyperspacings and the Estimation of Information Theoretic Quantities Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA 0003 elm@cs.umass.edu Abstract

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

Chapter 1. Density Estimation

Chapter 1. Density Estimation Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f

More information

On threshold-based classification rules By Leila Mohammadi and Sara van de Geer. Mathematical Institute, University of Leiden

On threshold-based classification rules By Leila Mohammadi and Sara van de Geer. Mathematical Institute, University of Leiden On threshold-based classification rules By Leila Mohammadi and Sara van de Geer Mathematical Institute, University of Leiden Abstract. Suppose we have n i.i.d. copies {(X i, Y i ), i = 1,..., n} of an

More information

Fisher information and Stam inequality on a finite group

Fisher information and Stam inequality on a finite group Fisher information and Stam inequality on a finite group Paolo Gibilisco and Tommaso Isola February, 2008 Abstract We prove a discrete version of Stam inequality for random variables taking values on a

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Dimensional behaviour of entropy and information

Dimensional behaviour of entropy and information Dimensional behaviour of entropy and information Sergey Bobkov and Mokshay Madiman Note: A slightly condensed version of this paper is in press and will appear in the Comptes Rendus de l Académies des

More information

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

Multivariate Topological Data Analysis

Multivariate Topological Data Analysis Cleveland State University November 20, 2008, Duke University joint work with Gunnar Carlsson (Stanford), Peter Kim and Zhiming Luo (Guelph), and Moo Chung (Wisconsin Madison) Framework Ideal Truth (Parameter)

More information

Package IndepTest. August 23, 2018

Package IndepTest. August 23, 2018 Type Package Package IndepTest August 23, 2018 Title Nonparametric Independence Tests Based on Entropy Estimation Version 0.2.0 Date 2018-08-23 Author Thomas B. Berrett [aut],

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

Semi-Supervised Learning by Multi-Manifold Separation

Semi-Supervised Learning by Multi-Manifold Separation Semi-Supervised Learning by Multi-Manifold Separation Xiaojin (Jerry) Zhu Department of Computer Sciences University of Wisconsin Madison Joint work with Andrew Goldberg, Zhiting Xu, Aarti Singh, and Rob

More information

1. Prove the following properties satisfied by the gamma function: 4 n n!

1. Prove the following properties satisfied by the gamma function: 4 n n! Math 205A: Complex Analysis, Winter 208 Homework Problem Set #6 February 20, 208. Prove the following properties satisfied by the gamma function: (a) Values at half-integers: Γ ( n + 2 (b) The duplication

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Department of Statistics, School of Mathematical Sciences, Ferdowsi University of Mashhad, Iran.

Department of Statistics, School of Mathematical Sciences, Ferdowsi University of Mashhad, Iran. JIRSS (2012) Vol. 11, No. 2, pp 191-202 A Goodness of Fit Test For Exponentiality Based on Lin-Wong Information M. Abbasnejad, N. R. Arghami, M. Tavakoli Department of Statistics, School of Mathematical

More information

CLASSIFICATIONS OF THE FLOWS OF LINEAR ODE

CLASSIFICATIONS OF THE FLOWS OF LINEAR ODE CLASSIFICATIONS OF THE FLOWS OF LINEAR ODE PETER ROBICHEAUX Abstract. The goal of this paper is to examine characterizations of linear differential equations. We define the flow of an equation and examine

More information

Math 118, Handout 4: Hermite functions and the Fourier transform. n! where D = d/dx, are a basis of eigenfunctions for the Fourier transform

Math 118, Handout 4: Hermite functions and the Fourier transform. n! where D = d/dx, are a basis of eigenfunctions for the Fourier transform The Hermite functions defined by h n (x) = ( )n e x2 /2 D n e x2 where D = d/dx, are a basis of eigenfunctions for the Fourier transform f(k) = f(x)e ikx dx on L 2 (R). Since h 0 (x) = e x2 /2 h (x) =

More information

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p.

Maximum likelihood: counterexamples, examples, and open problems. Jon A. Wellner. University of Washington. Maximum likelihood: p. Maximum likelihood: counterexamples, examples, and open problems Jon A. Wellner University of Washington Maximum likelihood: p. 1/7 Talk at University of Idaho, Department of Mathematics, September 15,

More information

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012 Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv:203.093v2 [math.st] 23 Jul 202 Servane Gey July 24, 202 Abstract The Vapnik-Chervonenkis (VC) dimension of the set of half-spaces of R d with frontiers

More information

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

An Entropy Estimator Based on Polynomial Regression with Poisson Error Structure

An Entropy Estimator Based on Polynomial Regression with Poisson Error Structure An Entropy Estimator Based on Polynomial Regression with Poisson Error Structure Hideitsu Hino 1(B), Shotaro Akaho 2, and Noboru Murata 3 1 University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, Ibaraki 305

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Shape constrained estimation: a brief introduction

Shape constrained estimation: a brief introduction Shape constrained estimation: a brief introduction Jon A. Wellner University of Washington, Seattle Cowles Foundation, Yale November 7, 2016 Econometrics lunch seminar, Yale Based on joint work with: Fadoua

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

A Probabilistic Upper Bound on Differential Entropy

A Probabilistic Upper Bound on Differential Entropy A Probabilistic Upper Bound on Differential Entropy Joseph DeStefano, Qifeng Lu and Erik Learned-Miller Abstract The differential entrops a quantity employed ubiquitousln communications, statistical learning,

More information

1 Exercises for lecture 1

1 Exercises for lecture 1 1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )

More information

ABCME: Summary statistics selection for ABC inference in R

ABCME: Summary statistics selection for ABC inference in R ABCME: Summary statistics selection for ABC inference in R Matt Nunes and David Balding Lancaster University UCL Genetics Institute Outline Motivation: why the ABCME package? Description of the package

More information

Bahadur representations for bootstrap quantiles 1

Bahadur representations for bootstrap quantiles 1 Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

MATH 6337 Second Midterm April 1, 2014

MATH 6337 Second Midterm April 1, 2014 You can use your book and notes. No laptop or wireless devices allowed. Write clearly and try to make your arguments as linear and simple as possible. The complete solution of one exercise will be considered

More information

Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs

Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs Dávid Pál Department of Computing Science University of Alberta Edmonton, AB, Canada dpal@cs.ualberta.ca

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation Univariate Density Estimation Suppose tat we ave a random sample of data X 1,..., X n from an unknown continuous distribution wit probability density function (pdf) f(x) and cumulative

More information

Nonparametric regression for topology. applied to brain imaging data

Nonparametric regression for topology. applied to brain imaging data , applied to brain imaging data Cleveland State University October 15, 2010 Motivation from Brain Imaging MRI Data Topology Statistics Application MRI Data Topology Statistics Application Cortical surface

More information

The Hilbert Transform and Fine Continuity

The Hilbert Transform and Fine Continuity Irish Math. Soc. Bulletin 58 (2006), 8 9 8 The Hilbert Transform and Fine Continuity J. B. TWOMEY Abstract. It is shown that the Hilbert transform of a function having bounded variation in a finite interval

More information

EVANESCENT SOLUTIONS FOR LINEAR ORDINARY DIFFERENTIAL EQUATIONS

EVANESCENT SOLUTIONS FOR LINEAR ORDINARY DIFFERENTIAL EQUATIONS EVANESCENT SOLUTIONS FOR LINEAR ORDINARY DIFFERENTIAL EQUATIONS Cezar Avramescu Abstract The problem of existence of the solutions for ordinary differential equations vanishing at ± is considered. AMS

More information

Variance Function Estimation in Multivariate Nonparametric Regression

Variance Function Estimation in Multivariate Nonparametric Regression Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered

More information

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Dinesh Krithivasan and S. Sandeep Pradhan Department of Electrical Engineering and Computer Science,

More information

Random projection ensemble classification

Random projection ensemble classification Random projection ensemble classification Timothy I. Cannings Statistics for Big Data Workshop, Brunel Joint work with Richard Samworth Introduction to classification Observe data from two classes, pairs

More information

Inverse Statistical Learning

Inverse Statistical Learning Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning

More information

ALMOST PERIODIC SOLUTIONS OF NONLINEAR DISCRETE VOLTERRA EQUATIONS WITH UNBOUNDED DELAY. 1. Almost periodic sequences and difference equations

ALMOST PERIODIC SOLUTIONS OF NONLINEAR DISCRETE VOLTERRA EQUATIONS WITH UNBOUNDED DELAY. 1. Almost periodic sequences and difference equations Trends in Mathematics - New Series Information Center for Mathematical Sciences Volume 10, Number 2, 2008, pages 27 32 2008 International Workshop on Dynamical Systems and Related Topics c 2008 ICMS in

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Estimation of the functional Weibull-tail coefficient

Estimation of the functional Weibull-tail coefficient 1/ 29 Estimation of the functional Weibull-tail coefficient Stéphane Girard Inria Grenoble Rhône-Alpes & LJK, France http://mistis.inrialpes.fr/people/girard/ June 2016 joint work with Laurent Gardes,

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Metric Embedding for Kernel Classification Rules

Metric Embedding for Kernel Classification Rules Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding

More information

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen

More information

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua

More information

Quiz 2 Date: Monday, November 21, 2016

Quiz 2 Date: Monday, November 21, 2016 10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,

More information

SMOOTHNESS PROPERTIES OF SOLUTIONS OF CAPUTO- TYPE FRACTIONAL DIFFERENTIAL EQUATIONS. Kai Diethelm. Abstract

SMOOTHNESS PROPERTIES OF SOLUTIONS OF CAPUTO- TYPE FRACTIONAL DIFFERENTIAL EQUATIONS. Kai Diethelm. Abstract SMOOTHNESS PROPERTIES OF SOLUTIONS OF CAPUTO- TYPE FRACTIONAL DIFFERENTIAL EQUATIONS Kai Diethelm Abstract Dedicated to Prof. Michele Caputo on the occasion of his 8th birthday We consider ordinary fractional

More information

ON A WEIGHTED INTERPOLATION OF FUNCTIONS WITH CIRCULAR MAJORANT

ON A WEIGHTED INTERPOLATION OF FUNCTIONS WITH CIRCULAR MAJORANT ON A WEIGHTED INTERPOLATION OF FUNCTIONS WITH CIRCULAR MAJORANT Received: 31 July, 2008 Accepted: 06 February, 2009 Communicated by: SIMON J SMITH Department of Mathematics and Statistics La Trobe University,

More information

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University

More information

Logarithmic Harnack inequalities

Logarithmic Harnack inequalities Logarithmic Harnack inequalities F. R. K. Chung University of Pennsylvania Philadelphia, Pennsylvania 19104 S.-T. Yau Harvard University Cambridge, assachusetts 02138 1 Introduction We consider the relationship

More information

Adaptivity to Local Smoothness and Dimension in Kernel Regression

Adaptivity to Local Smoothness and Dimension in Kernel Regression Adaptivity to Local Smoothness and Dimension in Kernel Regression Samory Kpotufe Toyota Technological Institute-Chicago samory@tticedu Vikas K Garg Toyota Technological Institute-Chicago vkg@tticedu Abstract

More information