SUPPLEMENTARY MATERIAL TO EFFICIENT MULTIVARIATE ENTROPY ESTIMATION VIA K-NEAREST NEIGHBOUR DISTANCES

Similar documents
Chapter 2 Transformations and Expectations

Sparsification using Regular and Weighted. Graphs

Lecture 6 Testing Nonlinear Restrictions 1. The previous lectures prepare us for the tests of nonlinear restrictions of the form:

Definition 2 (Eigenvalue Expansion). We say a d-regular graph is a λ eigenvalue expander if

(average number of points per unit length). Note that Equation (9B1) does not depend on the

k=1 s k (x) (3) and that the corresponding infinite series may also converge; moreover, if it converges, then it defines a function S through its sum

On triangular billiards

Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Stirling s Formula Derived from the Gamma Function

MDIV. Multiple divisor functions

Analytic Number Theory Solutions

A Proof of Birkhoff s Ergodic Theorem

Inhomogeneous Poisson process

1 Review and Overview

Sequences and Series of Functions

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Notes 19 : Martingale CLT

ALGEBRA HW 7 CLAY SHONKWILER

Self-normalized deviation inequalities with application to t-statistic

1+x 1 + α+x. x = 2(α x2 ) 1+x

1 = 2 d x. n x n (mod d) d n

A Note on the Weak Law of Large Numbers for Free Random Variables

LECTURE 8: ASYMPTOTICS I

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

A LIMITED ARITHMETIC ON SIMPLE CONTINUED FRACTIONS - II 1. INTRODUCTION

3. Calculus with distributions

6.3.3 Parameter Estimation

The structure of Fourier series

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Convergence of random variables. (telegram style notes) P.J.C. Spreij

AP Calculus BC Review Chapter 12 (Sequences and Series), Part Two. n n th derivative of f at x = 5 is given by = x = approximates ( 6)

Algorithms in The Real World Fall 2002 Homework Assignment 2 Solutions

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

DIVISIBILITY PROPERTIES OF GENERALIZED FIBONACCI POLYNOMIALS

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Ma 530 Introduction to Power Series

Summary. Recap. Last Lecture. Let W n = W n (X 1,, X n ) = W n (X) be a sequence of estimators for

Rates of Convergence by Moduli of Continuity

Archimedes - numbers for counting, otherwise lengths, areas, etc. Kepler - geometry for planetary motion

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =

6.451 Principles of Digital Communication II Wednesday, March 9, 2005 MIT, Spring 2005 Handout #12. Problem Set 5 Solutions

PUTNAM TRAINING INEQUALITIES

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

The Chi Squared Distribution Page 1

THE NUMBER OF IRREDUCIBLE POLYNOMIALS AND LYNDON WORDS WITH GIVEN TRACE

INEQUALITIES BJORN POONEN

AN ALMOST LINEAR RECURRENCE. Donald E. Knuth Calif. Institute of Technology, Pasadena, Calif.

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

Variance function estimation in multivariate nonparametric regression with fixed design

Exponential Families and Bayesian Inference

Title. Author(s)Cho, Yonggeun; Jin, Bum Ja. CitationJournal of Mathematical Analysis and Applications, 3. Issue Date Doc URL.

Math 113, Calculus II Winter 2007 Final Exam Solutions

ON THE EXTENDED AND ALLAN SPECTRA AND TOPOLOGICAL RADII. Hugo Arizmendi-Peimbert, Angel Carrillo-Hoyo, and Jairo Roa-Fajardo

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

BENDING FREQUENCIES OF BEAMS, RODS, AND PIPES Revision S

arxiv: v1 [math.pr] 4 Dec 2013

Detailed proofs of Propositions 3.1 and 3.2

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

f n (x) f m (x) < ɛ/3 for all x A. By continuity of f n and f m we can find δ > 0 such that d(x, x 0 ) < δ implies that

Homework 4. x n x X = f(x n x) +

Fourier Series and their Applications

Appendix to: Hypothesis Testing for Multiple Mean and Correlation Curves with Functional Data

Gaps between Consecutive Perfect Powers

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

OFF-DIAGONAL MULTILINEAR INTERPOLATION BETWEEN ADJOINT OPERATORS

Math 2784 (or 2794W) University of Connecticut

MATH 413 FINAL EXAM. f(x) f(y) M x y. x + 1 n

Appendix to Quicksort Asymptotics

MATH4822E FOURIER ANALYSIS AND ITS APPLICATIONS

Solutions to Final Exam Review Problems

The minimum value and the L 1 norm of the Dirichlet kernel

lim za n n = z lim a n n.

Week 5-6: The Binomial Coefficients

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Lecture 2. The Lovász Local Lemma

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5

Dirichlet s Theorem on Arithmetic Progressions

SHARP INEQUALITIES INVOLVING THE CONSTANT e AND THE SEQUENCE (1 + 1/n) n

arxiv: v4 [math.co] 5 May 2011

7.1 Convergence of sequences of random variables

Concentration inequalities

Lecture 20: Multivariate convergence and the Central Limit Theorem

This section is optional.

Learning Theory: Lecture Notes

Theorem 3. A subset S of a topological space X is compact if and only if every open cover of S by open sets in X has a finite subcover.

A PROOF OF THE WEIERSTRASS APPROXIMATION THEOREM FOR R k USING BINOMIAL DISTRIBUTIONS

MAS111 Convergence and Continuity

The value of Banach limits on a certain sequence of all rational numbers in the interval (0,1) Bao Qi Feng

n p (Ω). This means that the

Bertrand s Postulate

THE LEGENDRE POLYNOMIALS AND THEIR PROPERTIES. r If one now thinks of obtaining the potential of a distributed mass, the solution becomes-

Econometrica Supplementary Material

FINAL EXAMINATION IN FOUNDATION OF ANALYSIS (TMA4225)

Convex Bodies with Minimal Mean Width

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Seunghee Ye Ma 8: Week 5 Oct 28

Math 525: Lecture 5. January 18, 2018

DANIELL AND RIEMANN INTEGRABILITY

Transcription:

arxiv: 160600304v2 SUPPLEMENTARY MATERIAL TO EFFICIENT MULTIVARIATE ENTROPY ESTIMATION VIA K-NEAREST NEIGHBOUR DISTANCES By Thomas B Berrett,, Richar J Samworth, a Mig Yua, Uiversity of Cambrige Uiversity of Wiscosi Maiso Appeix This is the supplemetary material to Berrett, Samworth a Yua 2017, hereafter referre to as the mai text A1 Proofs of auxiliary results Proof of Propositio 9 i the mai text Fix τ α+, 1] We first claim that give ay ɛ > 0, there exists A ɛ > 0 such that aδ A ɛ δ ɛ for all δ 0, γ] To see this, observe that there exists δ 0 0, γ] such that aδ δ ɛ for δ δ 0 But the sup δ ɛ aδ max 1, γ ɛ aδ 0 } γ ɛ δ0 ɛ, δ 0,γ] which establishes the claim, with A ɛ := γ ɛ δ0 ɛ Now choose ɛ = 1 3 τ α+ a let τ := τ 3 + 2 3α+ α+, 1 The, by Höler s iequality, a sice ατ /1 τ >, a fx fx τ x A ɛ δ ɛ sup fx τ x sup f F,θ x:fx<δ} as δ 0, as require A ɛ δ ɛ 1 + ν τ f F,θ R 1 + x α x:fx<δ} } τ 1 τ 1 τ x 0 Research supporte by a PhD scholarship from the SIMS fu Research supporte by a EPSRC Early Career Fellowship a a grat from the Leverhulme Trust Research supporte by NSF FRG Grat DMS-1265202 a NIH Grat 1- U54AI117924-01 AMS 2000 subject classificatios: 62G05, 62G20 Keywors a phrases: efficiecy, etropy estimatio, Kozacheko Leoeko estimator, weighte earest eighbours 1 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

2 T B BERRETT, R J SAMWORTH AND M YUAN For the seco part, fix ρ > 0, set ɛ := 1 2 τ sup f F,θ X f F,θ X α+ α+, 1 The, by Höler s iequality agai, a fx ρ fx τ x A ɛ/ρ sup fx τ x as require A ɛ/ρ 1 + ν τ R 1 + x α a τ := τ 2 + 2α+ } τ 1 τ 1 τ x <, Proof of Lemma 10 i The lower bou is immeiate from the fact that h x r V f r for ay r > 0 For the upper bou, observe that by Markov s iequality, for ay r > 0, h x x + r = B x x +r fy y B 0 r fy y 1 µ αf r α The result follows o substitutig r = µ αf 1/α 1 s for s 0, 1 ii We first prove this result i the case β 2, 4], givig the state form of b 1 Let C := 4V β/ / + β, a let y := Cafx β/2 ss/fx} β/ Now, by the mea value theorem, we have for r r a x that h xr V r V fx 2 + 2 r+2 fx afxfx V 2 + β r+β It is coveiet to write s 1+2/ fx s x,y := s 2 + 2V 2/ + y fx 1+2/ The, provie s x,y 0, V raxfx], we have s 1/ x,y h x V fx} 1/ fx s x,y + 2 + 2fx V 2/ Now, by our hypothesis, we kow that sup sup f F,θ sup max s S x X s1+2/ 1+2/ x,y 2/ V s 2/ fx 2 + 2fx 1+2/, y s 1/2 V 2/ max 2 + 2 β/ afxv s1+β/ 2 + βfx β/ x,y } C 2/ }, CC β/ 0 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 3 as Hece there exists 1 = 1, θ N such that for all 1, all f F,θ, s S a x X, we have 1 2 + 2 s1+2/ x,y s 1+2/ s1+2/ 1/2 V 2/ afxs 2/ 2 2 + 2fx 2/ + y } s Moreover, there exists 2 = 2, θ N such that for all 2, all s S, x X a f F,θ we have s x,y 1+β/ 2s 1+β/ Fially, we ca choose 3 = 3, θ N such that max C 4 β/ 4 + 2V 4 β/, 2 1/2 C 2/ + βv 2/, 3/2 C 2/ } 2 + 2 + βv 2/ + β a such that C 8 1/2 V /2 for 3 It follows that for max 1, 2, 3 =:, for f F,θ, s S a for x X, we have that s x,y 0, V raxfx] a h x y s 1/ x,y V fx} 1/ s afxs1+2/ 2 1/2 V 2/ fx 2/ afxβ/2 s 1+β/ fx β/ 1/2 V 2/ afxs 2/ 2 + 2fx 2/ + y } afxs1+β/ s [C afx2 β/2 4 + 2V 4/ Cafx 2 1/2 V 2/ s fx } 4 β/ } s 2/ fx + βv β fx β β/ ] V 0 + β The lower bou is prove by very similar calculatios, a the result for the case β 2, 4] follows The geeral case ca be prove usig very similar argumets, a is omitte for brevity A2 Auxiliary results for the proof of Theorem 2 i the mai text the efiitio of V f give i the statemet of Theorem 1 Recall Lemma 1 For each N a θ Θ a m N, we have i sup f F,θ X fx logm fx x < ; ii if f F,θ V f > 0; imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

4 T B BERRETT, R J SAMWORTH AND M YUAN Proof of Lemma 1 Fix N a θ = α, β, γ, ν, a Θ i For ɛ 0, 1 a t 0, 1], we have log 1 t 1 ɛ t ɛ α α1 mɛ Let ɛ = mα+2, so that mɛ = 2 The, by Höler s iequality, for ay f F,θ, fx log m fx x 2 m 1 fx log m f x + 2 m 1 log m f X X fx 2m 1 f mɛ ɛ m fx 1 mɛ x + 2 m 1 log m f X } 1 + x α 1 mɛ mɛ mɛ x 2m 1 γ mɛ ɛ m 1 + ν 1 mɛx + 2 m 1 max log m γ, 1 V α α m logm ν α + α+ α α where the bou o log m f comes from 13 i the mai text ii Now efie A,θ := max sup Hf, 1 } f F,θ 2 log if f, 1 f F,θ }, a the set S,θ := x X : e 4A,θ fx e 2A,θ} For f F,θ, x S,θ a y B x 8 1/2 ae 4A,θ} 1/β 1 we have by Lemma 2 below that 1 fy fx 151/2 ae 4A,θ e 2A,θ y x β 1 7 By the cotiuity of f, there exists x 0 S,θ such that fx 0 = 1 2 e 2A,θ1+ e 2A,θ Thus, by 1, we have that B x0 r,θ S,θ, where Hece 71 e 2A,θ } 1/β 1 1 r,θ := 30 1/2 ae 4A,θ 8 1/2 ae 4A,θ} 1/β 1 V f = E f [log fx 1 + Hf} 2 ] A 2,θ P f X 1 S,θ A 2,θ e 4A,θ V r,θ, as require The followig auxiliary result provies cotrol o eviatios of the esity arisig from the smoothess coitio of our F,θ classes imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 5 Lemma 2 For θ = α, β, γ, ν, a Θ, m := β 1, f F,θ a y B x ra x, we have, for multi-iices t with t m, that f t f t y xt x t x 15 1/2 a fx fx y x miβ t,1 7 Proof If t = m the the result follows immeiately from the efiitio of F,θ Heceforth, therefore, assume that m 1 a t m 1 Writig here for the largest absolute etry of a array, we have for y B x ra x that f t f t y xt x t x y x sup f t z B x y x x t z y x f t +1 x + 1/2 y x sup f t +1 z f t +1 x m t l=1 l 1/2 y x l f t +l x z B x y x + m t /2 y x m t sup z B x y x f m z f m x } 1 afxfx y x 1 1/2 y x + β t /2 y x β t 1 151/2 afxfx y x, 7 as require Lemma 3 that Uer the coitios of Theorem 1 i the mai text, we have sup k k 0,,k 1 } sup E f [Ṽ w V f} 2 ] 0 f F,θ Proof For w = w 1,, w k T W k, write suppw := j : w j 0} The E f Ṽ w V f k w j E f log 2 ξ j,1 j=1 w 1 max j suppw X E f log 2 ξ j,1 f log 2 f + E f Ĥw 2 } Hf 2 X f log 2 f + Var f Ĥw + E f Ĥ w 2 Hf 2 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

6 T B BERRETT, R J SAMWORTH AND M YUAN Thus, by Theorem 1 i the mai text, 18 i the proof of that result a Lemma 1i, we have that sup k k 0,,k 1 } sup f F,θ E f Ṽ w V f 0 Now, 2 Var f Ṽ w w 2 1 max j suppw Var f log 2 ξ j,1 + w 2 1 max j,l suppw Cov f log 2 ξ j,1, log 2 ξ l,2 Let a,j := j 3j1/2 log 1/2 0 a a +,j := j + 3j1/2 log 1/2 1 Mimickig argumets i the proof of Theorem 1, for ay m N, j suppw a ɛ > 0, E f log m ξ j,1 fx 1 } = = X fx a+,j 1 a,j 1 0 log m V 1fxh 1 x log m 1s e Ψj e Ψj B j, j s s s k β/ + O max β/ logm 1, k B j, j s s x α α+ ɛ α α+ ɛ } 0, uiformly for j suppw, k k 0,, k 1 } a f F,θ Moreover, by Cauchy Schwarz, we ca ow show, for example, that E f log 4 ξ j,1 = E f [logξ j,1 fx 1 log fx 1 } 4 ] E f log 4 fx 1 uiformly for j suppw, k k 0,, k 1 } a f F,θ The result follows upo otig that we may use a similar approach for the covariace term i 2 to see that sup k k 0,,k 1 } sup f F,θ Var Ṽ w 0 A3 Proof of Propositio 5 i the mai text Proof of Propositio 5 i the mai text I each of the three examples, we provie θ = α, β, γ, ν, a Θ such that f F,θ I fact, β > 0 may be chose arbitrarily i each case i We may choose ay α > 0, a the set ν = 2 α/2 Γ α 2 + 2 /Γ/2 We may also set γ = 2π /2 It remais to fi a A such that 6 i the mai text hols Write H r y := 1 r e y2 /2 r y r e y2 /2 for the rth Hermite polyomial, a ote that H r y p r y, where p r is a polyomial of egree r with o-egative coefficiets Usig multi-iex otatio for partial imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 7 erivatives, if t = t 1,, t 0, 1,, } with t := t 1 + + t, we have f t x x t = fx H tj x j fx j=1 p tj x fxq t x, for some polyomial q r of egree r, with o-egative coefficiets It follows that if y Bx1, the for ay β > 0 with m = β 1, f m x f m y fx y x β m m/2 fx y x β m max f t x t: t =m x t f t y x t m+1/2 max sup f t x + w fx t: t =m+1 x t m+1/2 sup j=1 w B 0 1 w B 0 1 m+1/2 e x q m+1 x + 1 fx + wq m+1 x + w fx Similarly, f r x max r=1,,m fx m/2 max q r x r=1,,m Write gδ := 2 log δ2π /2} 1/2 a efie a A by settig aδ := max1, ãδ}, where } ãδ := m/2 sup max max q r x, 1/2 e x q m+1 x + 1 x: x gδ r=1,,m = m/2 max max q r gδ, 1/2 e gδ } q m+1 gδ + 1 r=1,,m The sup x:fx δ M f,a,β x aδ a aδ = oδ ɛ for every ɛ > 0, so 6 i the mai text hols ii We may choose ay α < ρ, a set ν = 2 α/2 Γ α 2 + 2 ρ/2 α/2 Γ ρ α 2 Γ 2 Γ ρ 2 We may also set γ = Γ ρ 2 + 2 To verify 6 i the mai text for suitable Γρ/2ρ /2 π /2 a A, we ote by iuctio, that if t = t 1,, t 0, 1,, } with t := t 1 + + t, the f t x x t fxq t x 1 + x 2 /ρ t, imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

8 T B BERRETT, R J SAMWORTH AND M YUAN where q r is a polyomial of egree r with o-egative coefficiets Thus, similarly to the Gaussia example, for ay β > 0 with m = β 1, f m x f m y sup sup y Bx 1 fx y x β m x R m+1/2 sup sup x R w B 0 1 say, where A 1,m,ρ [0, Similarly, fx + wq m+1 x + w fx1 + x 2 /ρ m+1 =: A 1,m,ρ, sup max x R r=1,,m f r x fx m/2 sup max x R r=1,,m q r x 1 + x 2 =: A2 /ρ r,m,ρ, say, where A 2,m,ρ [0, Now efiig a A to be the costat fuctio aδ := max1, A 1,m,ρ, A2,m,ρ }, we agai have that sup x:fx δ M f,a,β x aδ, so 6 i the mai text hols iii We may take ay α > 0 a ν = 1, γ = 3 To verify 6 i the mai text, fix β > 0, set m := β 1, a efie a A by aδ := A m max 1, log 2m+1 1 }, δ for some A m 1 epeig oly o m The, by iuctio, we fi that for some costats A m, B m > 0 epeig oly o m, a x 1, 1 M f,a,β x max max r=1,,m A r 1 x 2, sup 2r y:0< y x r ax B m+1 1 x 2 2m+1 a fx, A m+1 fy 1 y 2 2m+1 fx provie A m i the efiitio of a is chose sufficietly large Hece 6 i the mai text agai hols A4 Proof of Propositio 6 i the mai text Proof of Propositio 6 i the mai text To eal with the itegrals over X c, we first observe that by 13 i the mai text there exists a } imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 9 costat C,f > 0, epeig oly o a f, such that X c 3 fx C,f 1 0 X c B k, k s log u x,s s x fx log + log 1 + x µ 1/α α f for every ɛ > 0 Moreover, 4 fx log fx x = Oq1 ɛ, X c } x = O maxq log, q 1 ɛ }, for every ɛ > 0 Now, a slightly simpler argumet tha that use i the proof of Lemma 10ii i the mai text gives that for r 0, r x ], we have h x r V fxr V + β C β, β xr+ We euce, agai usig a slightly simplifie versio of the argumet i Lemma 10ii i the mai text, that there exists 0 N such that for a 0, s [0, 1 ] a x X, we have 5 V fxh 1 x s s 2V β/ + β s1+ β/ C, βx fx 1+ β/ s 2 It follows from 3, 4, 5 a a almost ietical argumet to that leaig to 15 i the mai text that for every 0 a ɛ > 0, a 1 E f Ĥ H V fxh 1 x s fx B k, k s log s x X 0 s + O maxq 1 ɛ, q log, 1 } a 1 2 fx B k, k s V fxh 1 x s s X 0 s s x + O maxq 1 ɛ, q log, 1 } β/ 4V B k+ β/, k C, βx + β B k, k X fx β/ x + O maxq 1 ɛ, q log, 1 }, as require imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

10 T B BERRETT, R J SAMWORTH AND M YUAN A5 Completio of the proof of Lemma 7 i the mai text To prove Lemma 7 i the mai text, it remais to bou several error terms arisig from argumets that approximate the variace of the uweighte Kozacheko Leoeko estimator Ĥ, a the to show how these argumets may be aapte to yiel the esire asymptotic expasio for VarĤw A51 Bous o S 1,, S 5 To bou S 1 : By similar methos to those use to bou R 1 i the proof of Lemma 3 i the mai text, it is straightforwar to show that for every ɛ > 0, we have S 1 = X c fx 1 0 α k B k, k s log 2 u x,s s x = O α+ ɛ α α+ ɛ To bou S 2 : For every ɛ > 0, we have that 1 S 2 = fx B k, k s log 2 u x,s s x = o 3 ɛ, X a 1 by very similar argumets to those use to bou R 2 i the proof of Lemma 3 i the mai text To bou S 3 : We have 1s log 2 u x,s log 2 e Ψk fx 1s V fxh 1 x s } V fxh 1 x = 2 log e Ψk + log log fx s s s It therefore follows from Lemma 10ii i the mai text that for every ɛ > 0, a 1 S 3 = fx X 0 k β/ k = O max log, β/ 1s B k, k s log 2 u x,s log 2 e Ψk fx α α+ ɛ α α+ ɛ } } s x To bou S 4 : A simplifie versio of the argumet use to bou R 4 i Lemma 3 of the mai text shows that for every ɛ > 0, 1 1s S 4 = fx B k, k s log 2 X e Ψk s x = o 3 ɛ fx a 1 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 11 To bou S 5 : Very similar argumets to those use to bou R 1 i Lemma 3 i the mai text show that for every ɛ > 0, α k S 5 = fx log 2 α+ ɛ fx x = O X c α α+ ɛ A52 Bous o T 1, T 2 a T 3 To bou T 1 : Let B Betak 1, k 1 By 13 i the mai text, for every ɛ > 0, T 11 := fxfy log fy logufx F,x F,xu y x X X c c ũ,x,y 2 fxfy log fy k 1 X c X c 1 logu x,s fx B k 1, k 1 s 2s 1 0 k 1 s y x [ fxfy log fy E log 1 } X c X c B + log 1 2B 1 1 B k 1 + log + log fx + log 1 + x } ] µ 1/α E 2B 1 α f k 1 y x k 1 2 + 2α α+ ɛ = o, 2α α+ ɛ where we use the Cauchy Schwarz iequality a elemetary properties of beta raom variables to obtai the fial bou Now let u x := u x,a/ 1 = V 1h 1 x a 1 e Ψk, a cosier T 12 := fxfy log fy logufx F,x F,xu y x X ũ,x,y X c If ũ,x,y u x, the by very similar argumets to those use to bou R 1 a R 2 cf 13 a 14 i the mai text, together with Cauchy Schwarz, logufx F,x F,xu ũ,x,y 6 1 a 1 logu x,s fx B k 1, k s + B k, k 1 s} s log + log fx + log 1 + x µ 1/α α f 3 ɛ, imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

12 T B BERRETT, R J SAMWORTH AND M YUAN for every ɛ > 0 O the other ha, if ũ,x,y < u x, the x y < r,u x +r,u y, where we have ae the r,u y term to ai a calculatio later i the proof Defie the sequece From Lemma 10ii i the mai text, ρ := [ c log 1/ 1 ] 1 sup r,u y = sup h 1 a } k log 1/ k log 1/ y sup = oρ y X y X 1 y X fy δ Now suppose that x X c a y X satisfy y x ρ Choose 0 N large eough that r,u y ρ /2 for all y X, a that log 1 max3/2 8 1/2 /β, 12V 1 2 } for all 0 a k k0,, k 1 } The whe β 0, 1] a 0, usig the fact that B x ρ /2 B y 3ρ /2, we have fw w V fyρ /2 V afyfyρ /2 3ρ /2 β 7 B xρ /2 V fyρ /2 1 3c ρ /2 β } 1 2 V ρ /2 δ a 1 Hece, for all 0, x X c, y X with y x ρ a k k 0,, k 1 }, 8 r,u x + r,u y ρ O other ha, suppose istea that x X c a ρ x := if y X y x ρ Sice X is a close subset of R, we ca fi y X such that y x = ρ x, a set x := ρ ρ x + 1 ρ x ρ y The x y = ρ, so from 7, we x have r,u x ρ /2 for 0 a k k0,, k 1 } Sice B xρ /2 B x ρ x ρ /2, we euce that r,u x ρ x ρ /2 a 9 y X : x y < r,u x + r,u y} = for 0 a k k 0,, k 1 } But for 0, 10 sup 1 sup x X c y X : y x ρ fy 151/2 fx fy c ρ β < 1 7 2, so that if x X c, y X a x y ρ, the fy < 2δ for 0 a k k 0,, k 1 } imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 13 It therefore follows from 6, 8, 9, 10 a the argumet use to bou T 11 that for each ɛ > 0 a 0, T 12 fxfy log fy 1 x y <r,u X x +r,u y} X c X c 0 logufx F,x F,xu y x + o 2 y:fy<2δ fxfy log fy 0 k 1 2 + 2α = o α+ ɛ 2α α+ ɛ logufx F,x F,xu y x + o 2 Fially for T 1, we efie T 13 := r,1 fxfy log fy X Bx c fx 1/ log ufx F,x F,xu y x ũ,x,y By Lemma 10ii i the mai text, we ca fi 1 N such that for 1, k k0,, k 1 }, x X a s a / 1, we have V fxh 1 x s 2s Thus, for 1, k k0,, k 1 }, x X a y Bx c r,1, fx 1/ ũ,x,y 24 log fx 2a fxe Ψk u x Thus, from 6, T 13 = O 2 log We coclue that for every ɛ > 0, k 1 2 + 2α T 1 T 11 + T 12 + T 13 = o α+ ɛ 2α α+ ɛ To bou T 2 : Fix x X a z B 0 Choosig 2 N large eough that r,1 8 1/2 1/β c 1 δ 1/ for 2, we have by Lemma 2 that sup fy fx 1 1 2 y B x r,1 δ 1/ imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

14 T B BERRETT, R J SAMWORTH AND M YUAN for 2, k k0,, k 1 } Also, for all 2, k k0,, k 1 }, we have fy x,z log fy x,z fx log fx fy x,z logfy x,z /fx + log fx fy x,z fx afxfx y x,z x β log fx + 4} Moreover, by argumets use to bou T 11, logufx F,x F,xu E 2B logb 1 z /fx k 1 + log + log fx + log 1 + x } E 2B 1 f k 1, µ 1/α α where B Betak 1, k 1 It follows that for every ɛ > 0, T 2 = eψk fy x,z log fy x,z fx log fx} V 1 X B 0 z /fx α k 1/2 k = O max α+ ɛ } kβ/, α ɛ α+ β/ log2+β/ logufx F,x F,xu z x To bou T 3 : Note that by Fubii s theorem, fx log fx logufx F,x F,xu z x z X B 0 = V X fx log fx = V X fx log fx e Ψk fx 0 u x 0 miufx, } logufx F,x F,xu x ufx logufx F,x F,xu x + O 3 ɛ, for all ɛ > 0, where the orer of the error term follows from a similar argumet to that use to obtai 6 a Lemma 10i Thus, for every ɛ > 0, T 3 = k 1 a 1 V fxh 1 x s fx log fx logu x,s fx k 1 X 0 s } } 1s 2s log B k, k 1 s 1 s x + O 3 ɛ k 1 k 1/2 = O max k α α+ ɛ α α+ kβ/, log ɛ β/ } imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 15 A53 Bous o U 1 a U 2 To bou U 1 : Usig Lemma 10i a 13 i the mai text as i our bous o T 11 we have that for every ɛ > 0, u x U 11 := fx log ufx F,x F,x u x X c 0 a 1 fx logu x,s fx B k, k 1 s 1s k X c 0 k 1 s x 1 k 2 + α α+ ɛ 11 = o 1+ α α+ ɛ Moreover, usig argumets similar to those use to bou R 2 i the proof of Lemma 3 i the mai text, for every ɛ > 0, 12 U 12 := fx log ufx F X u x,x F,x u x = o 3 ɛ From 11, a 12, we have for every ɛ > 0 that 1 k 2 + α U 1 U 11 + U 12 = o α+ ɛ 1+ α α+ ɛ To bou U 2 : By Lemma 10ii a lettig B Betak + β/, k 1, we have that for every ɛ > 0, a 1 U 21 := V fxh 1 x s 1s k } fx log B k, k 1 s s x X 0 s k 1 kβ/ β/ E 1B k k 1 afxfx 1 β/ x X k 1/2 k β/ = O max β/, k α α+ ɛ } α+ ɛ α Moreover, we ca use similar argumets to those use to bou R 4 i the proof of Lemma 3 i the mai text to show that for every ɛ > 0, 1 } U 22 := 1s 1s k fx log X e Ψk B k, k 1 s s x k 1 = o 3 ɛ a 1 We euce that for every ɛ > 0, k 1/2 U 2 U 21 + U 22 = O k β/ max β/, k α α+ ɛ α α+ ɛ } imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

16 T B BERRETT, R J SAMWORTH AND M YUAN A54 Bous o W 1,, W 4 To bou W 1 : We partitio the regio [l x, v x ] [l y, v y ] c ito eight rectagles as follows: [lx, v x ] [l y, v y ] c = [0, lx [0, l y [0, l x [l y, v y ] [0, l x v y, [l x, v x ] [0, l y [l x, v x ] v y, v x, [0, l y v x, [l y, v y ] v x, v y, Recall our shortha hu, v = logufx logvfy By Lemma 10i i the mai text a the Cauchy Schwarz iequality, as well as very similar argumets to those use to bou R 2 i the proof of Lemma 3 i the mai text, we ca bou the cotributios from each rectagle iiviually, to obtai that for every ɛ > 0, W 1 = fxfy hu, v F,x,y F,x F,y u, v x y X X [l x,v x] [l y,v y] c = o 9/2 ɛ To bou W 2 : We have W 2 = fxfy X X vx vy l x l y hu, v G,x,y F,x F,y u, v x y + 1 We write B a,b,c := ΓaΓbΓc Γa+b+c, a, for s, t > 0 with s + t < 1, let 13 B a,b,c s, t := sa 1 t b 1 1 s t c 1 B a,b,c eote the esity of a Dirichleta, b, c raom vector at s, t For a, b > 1, writig I := [a / 1, a + / 1], let B k+a, k := I s k+a 1 1 s k 1 s, B k+a, k s := sk+a 1 1 s k 1 /B k+a, k B k+a,k+b, 2k 1 := I I s k+a 1 t k+b 1 1 s t 2k 2 s t B k+a,k+b, 2k 1 s, t := sk+a 1 t k+b 1 1 s t 2k 2 /B k+a,k+b, 2k 1 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 17 The by the triagle a Pisker s iequalities, a Beta tail bous similar to those use previously, we have that B k+a,k+b, 2k 1 s, t B k+a, k sb k+b, k t s t I I B k+a,k+b, 2k 1 1 B k+a,k+b, 2k 1 + B k+a, k B k+b, k 1 B k+a, k B k+b, k B + 2 k+a,k+b, 2k 1 s, t log k+a,k+b, 2k 1 s, t = 2 B I I 1 1 t 0 0 B k+a, k sb k+b, k t Bk+a,k+b, 2k 1 s, t B k+a,k+b, 2k 1 s, t log B k+a, k sb k+b, k t } 1/2 s t } 1/2 s t + o 2 Γ + a + b 1Γ k = 2 [log 1/2 2 + 2k 2ψ 2k 1 Γ 2k 1Γ + aγ + b 14 = k 1 + o1} k 1ψ + b k 1 + ψ + a k 1} + ψ + a + b 1] 1/2 + o 2 As a first step towars bouig W 2 ote that vx vy W 21 := fxfy hu, v G,x,y F,x F,y u, v x y X X l x l y = fxfy logu x,s fx logu y,t fy X X I I Bk,k, 2k 1 s, t B k, k sb k, k t } s t x y 1s 1t = fxfy log X X I I e Ψk log e Ψk Bk,k, 2k 1 s, t B k, k sb k, k t } s t x y + W 211 15 = 1 + O k α α+ ɛ 1+ α α+ ɛ + O 2 + W 211, for every ɛ > 0 But, by Lemma 10ii i the mai text a 14, for every imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

18 T B BERRETT, R J SAMWORTH AND M YUAN ɛ > 0, W 211 = V h 1 x s fx 1t fxfy 2 log log X X I I s e Ψk V h 1 x s fx V h 1 y t } fy + log log s t Bk,k, 2k 1 s, t B k, k sb k, k t } s t x y 2 V h 1 x s fx fxfy log X X I s [ log } 1 Ψ k 1 + log1 s Bk, k 1 s log 1 Ψ } ] B k, k s s x y k 1+ 2β 2α k1+ α+ ɛ } + O max, 16 k 1/2 k β/ = O max β/, k 1+ 2β α α+ ɛ α α+ ɛ } 2α 1+ α+ ɛ Moreover, by Lemma 10i a ii i the mai text a very similar argumets, for every ɛ > 0, vx vy W 22 := fxfy hu, v G,x,y F,x F,y u, v x y X X c l x l y k 1+ α α+ ɛ α k α+ ɛ } kβ/ = O 1+ max, α ɛ α+ α ɛ α+ β/, 1 k 1/2 vx vy W 23 := fxfy hu, v G,x,y F,x F,y u, v x y 17 X c X c k 1+ 2α = O α+ ɛ 2α 1+ α+ ɛ l x l y Icorporatig our restrictios o k, we coclue from 15, 16 a 17 that for every ɛ > 0, W 2 W 21 + 1 k 1/2 k β/ + 2 W 22 + W 23 = O max β/, k α α+ ɛ } α+ ɛ α To bou W 3 : We write h u, h v a h uv for the partial erivatives of hu, v a write, for example, h u F u, v = h u u, vf u, v We fi o itegratig imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 19 by parts that, writig F = F,x,y G,x,y, vx vy h F u, v h uv F u, v u v [l x,v x] [l y,v y] vx = 18 l x [ hu F u, l y h u F u, v y ] u + l x l y vy l y [ hv F l x, v h v F v x, v ] v + hf v x, v y + hf l x, l y hf v x, l y hf l x, v y Usig staar biomial tail bous as use to bou W 1 together with 13 i the mai text we therefore see that for every ɛ > 0, v x vy vx vy } W 31 := fxfy h F u, v h uv F u, v u v x y 19 = X X X X fxfy l x l y vx l x h u F u, v y u + + o 9/2 ɛ l x l y vy l y } h v F v x, v v x y Now, uiformly for u [l x, v x ] a x, y X X a for every ɛ > 0, 2 F u, v y = 1 x y r,u} p k 1 k 1,x,u1 p,x,u k 1 + o 9/2 ɛ B k, k p,x,u = 1 x y r,u} + o 9/2 ɛ 1 1 20 1 x y r,vx } 2πk 1/2 1 + o1} + o 9/2 ɛ By 10 a the argumets leaig up to it, we have 21 sup sup x X c y X B xr,vx +r,vy fx fy 1 0 We therefore have by 13 i the mai text that, for every ɛ > 0, 22 X c X fxfy vx l x k 1 2 + 2α h u F u, v y u y x = O Now, usig Lemma 10ii i the mai text, for x X, α+ ɛ 2α α+ ɛ k β/ 23 max l x fx 1, v x fx 1 } afx + log1/2 fx k 1/2 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

20 T B BERRETT, R J SAMWORTH AND M YUAN We also ee some cotrol over vfy By 10 a the work leaig up to it, for max 0, 5, x X a y x r,vx + r,vy, fy 1 151/2 c ρ β} δ δ /2 k/ 1 7 Thus afy c β a usig 21 we may apply Lemma 10ii i the mai text to the set X = X y : y x r,vx + r,vy for some x X } From this a 21, for ay x X a y B x r,vx + r,vy, k β/ 24 max l y fy 1, v y fy 1 afy + log1/2 fx k 1/2 Usig 21 agai, we have that afy x,z fx ɛ for each ɛ > 0, uiformly for x X a z v x fx} 1/ + v y fx} 1/ From 20, 23 a 24 we therefore have that vx fxfy h u F u, v y u y x X X l x k 1/2 fxfy1 x y <r,vx } logv y fy logv x /l x y x 25 X X k 1/2+2β/ = O max 1+2β/ k 1/2, k 1 2 + α, log α+ ɛ 1+ α α+ ɛ } for every ɛ > 0 By 19, 22 a 25, it follows that k 1/2+2β/ 26 W 31 = O max 1+2β/, log k 2α k 1/2+ α+ ɛ, 1/2 2α α+ ɛ } Fially, by 13 i the mai text a 21, we have sice F = 0 whe x y > r,u + r,v that 27 W 32 := fxfy X c X vx vy Combiig 26 a 27 we have that l x l y 2α k h uv F u, v u v x y = O k 1/2+2β/ W 3 = W 31 + W 32 = O max 1+2β/, log k 1/2, k 2α α+ ɛ 2α α+ ɛ α+ ɛ 2α α+ ɛ } imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 21 To bou W 4 : Let p := B xr,u B yr,v fw w a let N 1, N 2, N 3, N 4 Multi 2, p,x,u p, p,y,v p, p, 1 p,x,u p,y,v + p Further, let so that F 1,x,yu, v := PN 1 + N 3 k, N 2 + N 3 k, F,x,y F 1,x,yu, v = PN 1 + N 3 = k 1, N 2 + N 3 k1 x y r,u} + PN 2 + N 3 = k 1, N 1 + N 3 k1 x y r,v} + PN 1 + N 3 = k 1, N 2 + N 3 = k 11 x y r,u r,v} Now PN 1 +N 3 = k 1 = 2 k 1 p k 1,x,u1 p,x,u k 1 2πk 1/2 1+o1} a F,x,y u, v = G,x,y u, v if x y > r,u + r,v, a so, by 23 a 24, we have that vx vy F,x,y G,x,y u, v fxfy u v x y X X l x l y uv vx vy 1 F,x,y G,x,y u, v = fxfy u v x y X X l x l y uv log + O max k 1/2, k 1 2 + 2β, k 1 2 + α α+ ɛ } 28 α+ ɛ 1+ 2β 1+ α We ca ow approximate F,x,yu, 1 v by Φ Σ k 1/2 ufx 1}, k 1/2 vfx 1} a G,x,y u, v by Φk 1/2 ufx 1}Φk 1/2 vfx 1} To avoi repetitio, we focus o the former of these terms To this e, for i = 3,,, let 1Xi B Y i := xr,u}, 1 Xi B yr,v} so that i=3 Y N1 + N i = 3 We also efie N 2 + N 3 p,x,u µ := EY i = p,y,v p,x,u 1 p V := CovY i =,x,u p p,x,u p,y,v, p p,x,u p,y,v p,y,v 1 p,y,v Whe x X a y B xr,vx + r,vy we have that, writig for the symmetric ifferece a usig 21, PX 1 B x r,u B y r,v > 0 a so V is ivertible for such x a y We may therefore set Z i := V 1/2 Y i µ imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

22 T B BERRETT, R J SAMWORTH AND M YUAN The by the Berry Essee bou of Götze 1991, writig C for the set of close, covex subsets of R 2 a lettig Z N 2 0, I, there exists a uiversal costat C 2 > 0 such that 29 sup P 1 2 1/2 C C i=3 Z i C PZ C C 2E Z 3 3 2 1/2 The istributio of Z 3 epes o x, y, u a v, but, recallig the substitutio y = y x,z as efie i 22 i the mai text, we claim that for x X, y = y x,z B x r,u + r,v, u [l x, v x ] a v [l y, v y ], 30 E Z 3 3 1/2 k z To establish this, ote that for x X a y x r,vx + r,vy, we have k by 21, 23 a 24 that y x fx 1/ Thus, for v [l y, v y ], a usig Lemma 2, we also have that 31 vfx 1 max v y fy 1, l y fy 1 + v y fy fx k β/ afx fy + log1/2 fx k 1/2 Now, by the efiitio of l x a v x, 32 max p,x,u k/ 1, p,y,v k/ 1 } 3k1/2 log 1/2 1 for all x, y X a u [l x, v x ], v [l y, v y ] Next, we bou 2 k p α z for x X a y = y x,z with z v x fx} 1/ + v y fx} 1/ First suppose that u v We may write B x r,u B y r,v = B x r,v B y r,v } [B x r,u \B x r,v } B y r,v ], where this is a isjoit uio Writig I a,b x := x 0 B a,bs s for the regularise icomplete beta fuctio a recallig that µ eotes Lebesgue measure o R, we have µ Bx r,v B y r,v = V r,vi +1 2, 1 2 = veψk 1 I +1 2, 1 2 1 1 x y 2 4r,v 2 z 2 4vfx} 2/ imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

a Now, r I +1 2, 1 2 EFFICIENT ENTROPY ESTIMATION 23 α z = I +1 1 2, 1 z 2 2 4 1 r2 = 1 r2 /4 1 2 4 B +1/2,1/2 Hece by the mea value iequality, µ Bx r,v B y r,v eψk α z 1fx eψk 1 1 B +1/2,1/2 [ v z 1 vfx} 1/ + α z 1 vfx B +1/2,1/2 fx It follows that for all x X, y B x r,vx + r,vy a v [l y, v y ], B xr,v B yr,v fw w eψk α z 1 k k β/ afx fy + k1/2 log 1/2 fx usig 31 a Lemma 2 We also have by 32 that fw w p,x,u p,x,v B xr,u\b xr,v} B yr,v ] k k β/ afx fy + k1/2 log 1/2 fx Thus, whe x X, y = y x,z B x r,vx + r,vy, u [l x, v x ], v [l y, v y ] a u v, 33 2 k p α z k β/ afx fy + log1/2 fx k 1/2 We ca prove the same bou whe v > u similarly, usig 23, 31 a Lemma 2 We will also require a lower bou o p,x,u + p,y,v 2p i the regio where B x r,u B y r,v, ie, z ufx} 1/ + vfx} 1/ By the mea value theorem, } 2 1 I +1 2, 1 1 δ 2 2 1/2 /2 δ max, 1 I +1 2 B +1/2,1/2 2, 1 1/2 2 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

24 T B BERRETT, R J SAMWORTH AND M YUAN for all δ [0, 1] Thus, for u v, with v [l y, v y ], x X, a y = y x,z with z 2vfx} 1/, by 31 we have, µ Bx r,u B y r,v c µ Bx r,v B y r,v c } = V r,v x y 2 1 I +1 1 2, 1 2 4r,v 2 k z fx Whe z > 2vfx} 1/ we simply have µ Bx r,v B y r,v c = V r,v a the same overall bou applies Moreover, the same lower bou for µ By r,v B x r,u c hols whe u < v, u [l x, v x ], x X, a y = y x,z B x r,vx +r,vy We euce that for all x X, y = y x,z B x r,vx + r,vy, u [l x, v x ] a v [l y, v y ], 34 p,x,u + p,y,v 2p maxp,x,u p, p,y,v p } k z We are ow i a positio to bou E Z 3 3 above for x X, y = y x,z B x r,vx + r,vy, u [l x, v x ], v [l y, v y ] We write E Z 3 3 = p V 1/2 1 p,x,u 3 + p 1 p,x,u p,y,v V 1/2 1 p,x,u 3 p,y,v + p,y,v p V 1/2 p,x,u 3 1 p,y,v + 1 p,x,u p,y,v + p V 1/2 p,x,u 3 35, a bou each of these terms i tur First, p V 1/2 1 p,x,u 3 1 p,y,v 36 p,y,v = p V 3/2 1 p,x,u 1 p,y,v p,x,u + p,y,v 2p } 3/2 } 3/2 1 p,x,u 1 p,y,v = p p p,x,u p,y,v + p,x,u p p,y,v p p,x,u+p,y,v 2p } p,x,u + p,y,v 1 3/2 p mi, 1/2 /k 1/2, V p p,x,u p,y,v usig 32 a 33, a where we erive the fial bou from the left ha sie of the miimum if z 1 a the right ha sie if z < 1 Similarly, 37 p,x,u p V 1/2 1 p,x,u 3 1/2, p,x,u p p,y,v V 3/2 3/2 k z p,y,v imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 25 where we have use 34 for the fial bou By symmetry, the same bou hols for the thir term o the right-ha sie of 35 Fially, very similar argumets yiel 38 1 p,x,u p,y,v + p V 1/2 p,x,u 3 k/ 3/2 p,y,v Combiig 36, 37 a 38 gives 30 Writig Φ A for the measure associate with the N 2 0, A istributio for ivertible A, a φ A for the correspoig esity, we have by Pisker s iequality a a Taylor expasio of the log-etermiat fuctio that 2 sup Φ A C Φ B C 2 C C R 2 φ A log φ A φ B = 1 2 log B log A + trb 1 A B} B 1/2 A BB 1/2 2, provie B 1/2 A BB 1/2 1/2 Hece sup Φ A C Φ B C mi1, 2 B 1/2 A BB 1/2 } C C We ow take A = 2V/k, B = Σ a use the submultiplicativity of the Frobeius orm alog with 32 a 33 a the fact that Σ 1/2 = 1 + α z 1 + 1 α z 1 } 1/2 to euce that 39 sup Φ A C Φ B C 1 k β/ } afx fy + log1/2 C C z fx k 1/2 for x X, y Bxr,vx + r,vy, u [l x, v x ], v [l y, v y ] Now let u = fx 1 1+k 1/2 s a v = fx 1 1+k 1/2 t By the mea value theorem, 23 a 31, } Φ Σ k 1/2 k 2µ Φ k Σ s, t } 1 2p,x,u k 2π 1/2 k 1/2 s + 2p,y,v k k 1/2 t k β/ 40 k 1/2 afx fy + k 1/2 fx It follows by 29, 30, 39 a 40 that for x X a y B xr,vx + imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

26 T B BERRETT, R J SAMWORTH AND M YUAN r,vy, sup u [l x,v x],v [l y,v y] F 1,x,yu, v Φ Σ s, t mi 1, log1/2 k k 1/2 + afx fy z fx β/ k 1/2 + 1 } z Therefore, by 23 a 24, a sice fy fx/2 for x X, y B x r,vx + r,vy a 0, we coclue that for each ɛ > 0 a 0 41 X X vx vy 1 F,xu, v Φ Σ s, t fxfy 1 l x l y uv x y r,u+r,v} u v y x k log 1/2 k β/ } 2 fx X k 1/2 + afx/2 fx sup F,x,y 1 x,z u, v Φ Σ s, t z x k log 5/2 = O max k 3/2 B 0 3 u [l x,v x],v [l yx,z,v yx,z ], k 1 2 + α α+ ɛ α α+ ɛ, k 1/2+β/ log β/, k1/2+2β/ 2β/ } By similar i fact, rather simpler meas we ca establish the same bou for the approximatio of G,x,y by Φk 1/2 ufx 1}Φk 1/2 vfx 1} To coclue the proof for the uweighte case, we write X = X 1 X 2, where X 1 := x : fx k 2β δ }, X 2 := x : δ fx < k 2β δ }, a eal with these two regios separately We have by Slepia s iequality that Φ Σ s, t ΦsΦt for all s a t Hece, recallig that s = s x,u = k 1/2 ufx 1} a t = t x,v = k 1/2 vfx 1}, by 21, 23 a 31, for every ɛ > 0, vx vy Φ Σ s, t ΦsΦt 42 X 2 X fxfy 1 l x e Ψk V 1k X 2 l y uv X 2 1 x y r,u+r,v}u v y x fy x,z 1 x yx,z r,vx+r,vyx,z } R fx 2 l x l yx,z Φ Σ s, t ΦsΦt} s t z x 1+ k 2β α fx α z z x = o B 0 2 α+ ɛ 1+ α α+ ɛ, imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

X 1 X l x EFFICIENT ENTROPY ESTIMATION 27 where to obtai the fial error term, we have use the fact that B 0 2 α z z = V By 23 a 24 we have, for each ɛ > 0, vx vy Φ Σ s, t ΦsΦt fxfy 1 uv x y r,u+r,v} u v y x e Ψk V 1k = eψk 1k X 1 X 1 l y fy x,z 1 x yx,z r,vx+r,vyx,z } R fx 2 α z z x l x l yx,z log 1/2 fx x + O max k 1/2, k β/ α 1+β/, k 43 = eψk log 1/2 1k + O max k 1/2, k β/ k1+ 2β α α+ ɛ, 1+β/ 1+ α α+ ɛ α+ ɛ 1+ α α+ ɛ } By applyig Lemma 10ii i the mai text as for 31 we have, for x X 1 a y B x r,vx + r,vy, that 44 max v v vfx 1 3k 1/2 log 1/2 afx fy x,v y} } k β/ = ok 1/2, fx with similar bous holig for l x a l y A correspoig lower bou of the same orer for the left-ha sie of 43 follows from 44 a the fact that 2 log 2 log 2 log 2 log Φ Σ s, t ΦsΦt} s t = α z + O 2 uiformly for z R It ow follows from 28, 41, 42 a 43 that for each ɛ > 0, log 5/2 W 4 = O max k 1/2 as require, k 3 2 + α ɛ α+ α ɛ 1+ α+, k3/2+2β/ 1+2β/, k1+ 2β α ɛ α+ α ɛ 1+ α+, k 1 2 + β } log, 1+ β We ow tur our attetio to the variace of the weighte Kozacheko Leoeko estimator Ĥw We first claim that 45 k Var w j log ξ j,1 = j=1 k w j w l Covlog ξ j,1, log ξ l,1 = V f + o1 j,l=1 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

28 T B BERRETT, R J SAMWORTH AND M YUAN By 18, 19 a Lemma 3 i the mai text, for j such that w j 0, Var log ξ j,1 = V f + o1 as For l > j, usig similar argumets to those use i the proof of Lemma 3 i the mai text, a writig u k x,s := u x,s = V 1h 1 x s e Ψk for clarity, we have Elog ξ j,1 log ξ l,1 1 = fx = = X X fx X 0 0 1 1 s 0 0 1 s logu j x,slogu l x,s+t B j,l j, ls, t t s x 1s log fxe Ψj fx log 2 fx x + o1 1s + t log fxe Ψl B j,l j, l s, tt s x+o1 as, uiformly for 1 j < l k1 Now 45 follows o otig that sup k k w < Next we claim that k 46 Cov w j log ξ j,1, j=1 k w l log ξ l,2 = o 1 l=1 as I view of 20 i the mai text a the fact that sup k k w <, it is sufficiet to show that Cov logfx 1 ξ j,1, logfx 2 ξ l,2 = o 1 as, wheever w j, w l 0 We suppose without loss of geerality here that j < l, sice the j = l case is ealt with i 27 We broaly follow the same approach use to bou W 1,, W 4, though we require some ew similar otatio Let F,x,y eote the coitioal istributio fuctio of ξ j,1, ξ l,2 give X 1 = x, X 2 = y a let F,x j eote the coitioal istributio fuctio of ξ j,1 give X 1 = x Let } ue r,u j Ψj 1/ :=, p j,x,u := h x r j V 1,u Recall the efiitios of a ±,j give i the proof of Lemma 3, a let v x,j := ifu 0 : 1p j,x,u = a +,j } a l x,j := ifu 0 : 1p j,x,u = a,j } imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 29 For pairs u, v with u v x,j a v v y,l, let M 1, M 2, M 3 Multi 2; p j,x,u, p l,y,v, 1 p j,x,u p l,y,v a write Also write G,x,yu, v := PM 1 j, M 2 l Σ := 1 j/l 1/2 α z j/l 1/2 α z 1 where α z := V 1 µ B0 1 B z expψl Ψj 1/ Writig W i for remaier terms to be boue later, we have Cov logfx 1 ξ j,1, logfx 2 ξ l,2 = fxfy hu, v F,x,y F,xF j,yu, l v x y + W 1 X X = fxfy X X = fxfy X X = V 1 e Ψj 1jl 1/2 47 = V 1 e Ψj 1l [l y,l,v y,l ] [l x,j,v x,j ] [l y,l,v y,l ] [l x,j,v x,j ] vy,l vx,j l y,l R, hu, v F,x,y G,x,yu, v x y 1 + F,x,y G,x,y u, v l x,j uv R α z z 1 + u v x y 1 + 2 i=1 3 i=1 Φ Σ s, t ΦsΦt} s t z 1 + 4 4 i=1 1 W i = O + k as The fial equality here follows from the fact that, for Borel measurable sets K, L R, 48 µ K + z L z = µ Kµ L, R so that R α z z = V e Ψl Ψj 4 i=1 W i i=1 W i W i W i To bou W 1 : Very similar argumets to those use to bou W 1 show that W 1 = o 9/2 ɛ as, for every ɛ > 0 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

30 T B BERRETT, R J SAMWORTH AND M YUAN To bou W 2 : Similar to our work use to bou W 2, we may show that a+,j 1 a,j 1 a+,l 1 a,l 1 as, for fixe a, b > 1 Also, 1 1 s 0 0 1s log e Ψj B j+a,l+b, j l 1 s, t B j+a, j sb l+b, l t t s 1t log e Ψl jl1/2 1 + o1} B j,l, j l 1 s, t B j, j sb l, l t}ts = 1 + O 2 as Usig these facts a very similar argumets to those use to bou W 2 we have for every ɛ > 0 that k W 2 1/2 k β/ = O max β/, k α α+ ɛ } α+ ɛ α To bou W 3 : Similarly to 18 a the surrouig work, we ca show that for every ɛ > 0, log W 3 = O max k 1/2, k 1 2 + 2β 1+ 2β, k 2α α+ ɛ 2α α+ ɛ } To bou W 4 : Let N 1, N 2, N 3, N 4 Multi 2; p j,x,u p, p l,y,v p, p, 1 p j,x,u p l,y,v+p, where p := B xr j fw w Further, let,u B yr,v l F,1,x,y := PN 1 + N 3 j, N 2 + N 3 l The, as i 28, we have vx,j vy,l F,x,y fxfy G,x,y u, v u v x y X X l x,j l y,l uv vx,j vy,l F,1,x,y G,x,yu, v = fxfy u v x y X X l x,j l y,l uv log + O max k 1/2, k 1 2 + 2β, k 1 2 + α α+ ɛ } α+ ɛ 1+ 2β 1+ α imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 31 We ca ow approximate F,1,x,yu, v by Φ Σ j 1/2 ufx 1}, l 1/2 vfx 1} a G,x,yu, v by Φj 1/2 ufx 1}Φl 1/2 vfx 1} This is rather similar to the correspoig approximatio i the bous o W 4, so we oly preset the mai iffereces First, let We also efie Y i := 1Xi B xr j,u} 1 Xi B yr l,v} µ := EY i = p j,x,u p l,y,v a V := CovY i = p j,x,u1 p j,x,u p p j,x,up l,y,v p p j,x,up l,y,v p l,y,v1 p l,,y,v a set Z i := V 1/2 Y i µ Our aim is to provie a bou o p Sice the fuctio r, s µ B0 r 1/ B z s 1/, is Lipschitz we have for x X, y = x + fx 1/ r j,1 z B xr,v j x,j + r,v l y,l, u [l x,j, v x,j ] a v [l y,l, v y,l ] that 49 2 e Ψj p α z k β/ afx fy + log1/2 fx k 1/2, usig similar equatios to 23, 24 a 31 From this a similar bous to 32, we fi that V k 2 / 2 a V 1/2 /k 1/2 We therefore have E Z 3 3 V 1/2 3 E Y 3 µ 3 1/2 /k 1/2, which is as i the l = j case except with the factor of z 1/2 missig Note ow that lim sup sup j,l:j<l w j,w l 0 sup z B 0 1+e Ψl Ψj/ Σ 1/2 < Hece, usig 49, similar bous to 32 a the same argumets as leaig up to 39, k β/ 50 sup Φ A C Φ B C afx fy + log1/2 C C fx k 1/2, imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

32 T B BERRETT, R J SAMWORTH AND M YUAN where B := Σ a j A := 2 1 p j,x,u1 p j,x,u j 1/2 l 1/2 p p j,x,up l,y,v j 1/2 l 1/2 p p j,x,up l l 1 p l,y,v1 p l,y,v,y,v Now let u := fx 1 1 + j 1/2 s a v := fx 1 1 + l 1/2 t Similarly to 40, we have j 2p Φ,x,u j 2pl,y,v l Σ j 1/2, l 1/2 Φ Σ s, t k β/ k 1/2 afx fy + k 1/2 fx Similarly to the argumets leaig up to 41, it follows that X X fxfy vx,j vy,l l x,j l y,l k log 3/2 = O max k 3/2 F,1,x u, v Φ Σ s, t uv 1 x y r j,u+r l,v}, k 1 2 + α α+ ɛ α α+ ɛ, k 1/2+β/ log β/, k1/2+2β/ 2β/ u v y x }, where the power o the first logarithmic factor is smaller because of the absece of the factor of the z 1 term i 50 The remaier of the work require to bou W 4 is very similar to the work oe from 42 to 43, usig also 48, so is omitte We coclue that 3 log W 4 2 = O max k 1 2, k 3 2 + α ɛ α+ α ɛ 1+ α+, k 3 2 + 2β 1+ 2β, k1+ 2β α ɛ α+ α ɛ 1+ α+, k 1 2 + β } log 1+ β The equatio 47, together the bous o W 1,, W 4 just prove, establish the claim 46 We fially coclue from 45 a 46 that VarĤw = 1 Var k j=1 as require = V f + o 1, w j log ξ j,1 + 1 1 k Cov w j log ξ j,1, j=1 k w l log ξ l,2 l=1 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

A Proof of Theorem 8 EFFICIENT ENTROPY ESTIMATION 33 Proof of Theorem 8 For the first part of the theorem we aim to apply Theorem 2521 of va er Vaart 1998, a follow the otatio use there With Ṗ := λlog f + Hf : λ R} we will first show that the etropy fuctioal H is ifferetiable at f relative to the taget set Ṗ, with efficiet ifluece fuctio ψ f = log f Hf Followig Example 2516 i va er Vaart 1998, for g Ṗ, the paths f t,g efie i 10 of the mai text are ifferetiable i quaratic mea at t = 0 with score fuctio g Note that X gf = 0 a X g2 f < for all g Ṗ It is coveiet to efie, for t 0, the set A t := x X : 8t gx 1}, o which we may expa e 2tg easily as a Taylor series By Höler s iequality, for ɛ 0, 1/2, f log f 8t 21 ɛ f g 21 ɛ log f A c t X 8t 21 ɛ X as t 0 Moreover, f log1 + e 2tg A c t A c t } 1 ɛ g 2 f f log f 1/ɛ} ɛ = ot X log 2 + 2t g f 16t 2 4 log 2 + 1 g 2 f X We also have that ct 1 1 = 2 1 tg f X 1 + e 2tg e 2tg 1 + 2tg + tge 2tg 1 A t 1 + e 2tg f + 16 51 3 t2 g 2 f + 72t 2 g 2 f 72t 2 g 2 f A t X It follows that t 1 Hf t,g Hf} + log f + Hf}fg X = 1 1 2ct t X 1+e 2tg log f 1 fe 2tg 1 + 2tg + tge 2tg 1} log f t A t 2 2 log 1 + e 2tg 16 3 t g 2 f log f + 22t X X A c t A c t 1 + t g f 2ct 2ct 1+e 2tg log 1+e 2tg + tg1 + log f g 2 f + o1 0 + tg1 + e 2tg + o1 } f imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

34 T B BERRETT, R J SAMWORTH AND M YUAN The coclusio 11 i the mai text therefore follows from va er Vaart 1998, Theorem 2521 We ow establish the seco part of the theorem First, by our previous bou o ct i 51, for 12t < X g2 f} 1/2 we have that f t,g 2ct f 2 f 1 72t 2 X g2 f 4 f, a µ α f t,g 4µ α f We ow stuy the smoothess properties of f t,g This requires some ivolve calculatios, because we first ee to uersta correspoig properties of g To this e, for a m times ifferetiable fuctio g : R R, efie a Mg x := max max t=1,,m gt x, D g := max 1, sup δ 0, f g m y g m } x sup y Bx rax y x β m sup x:fx δ M g x aδ m+1 Let J m eote the set of multisets of elemets 1,, } of cariality at most m, a for J = j 1,, j s } J m, efie g J x := s g s l=1 x x j l Moreover, for i 1,, s}, let P i J eote the set of partitios of J ito i o-empty multisets As a illustratio, if = 2, the J 3 =, 1}, 2}, 1, 1}, 1, 2}, 2, 1}, 2, 2}, 1, 1, 1},1, 1, 2},1, 2, 1},1, 2, 2},2, 1, 1},2, 1, 2},2, 2, 1},2, 2, 2} } Moreover, if J = 1, 1, 2} J 3, the } P 2 J = 1, 1}, 2} }, 1, 2}, 1} }, 1, 2}, 1} } } The, by iuctio, a writig g := g 1 = log f + Hf, it may be show that carj gjx 1 i 1 i 1! = f i f P1 f Pi i=1 P 1,,P i } P i J Now, the cariality of P i J is give by a Stirlig s umber of the seco ki: car P i J = 1 i i 1 i l l carj =: S carj, i, i! l l=0 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

say Thus, if carj m, the 52 g Jx carj i=1 EFFICIENT ENTROPY ESTIMATION 35 i 1!S carj, i afx i 1 2 mm+1 m!afx m Moreover, if y x r a x a m 1, the g Jy g Jx carj i=1 i 1! P 1,,P i } P i J Now, by Lemma 2, f i y f i x 1 i fy fx 1 1 + fy fx 1 i 1 Moreover, by iuctio a Lemma 2 agai, fp1 f Pi y f P1 f Pi x f i y + f P 1 f Pi x f i } y f i y f i x 1 71 i 1i fy 56 fx 1 f P1 f Pi y f P1 f Pi x 8 1/2 71 i } 1 afx i f i x y x β m 56 We euce that eve whe m = 0, 53 gjy gjx 8 1/2 71 mm!m + 1 m+2 afx m+1 y x β m 41 Comparig 52 a 53, we see that 54 D g 8 1/2 71 mm!m + 1 m+2 =: D 41 Now let qy := 1 + e 2ty 1, so that f t,g x = 2ctq gx fx Similar iuctive argumets to those use above yiel that whe J J m with m 1 a g is m times ifferetiable, q g J x = carj i=1 q i gx P 1,,P i } P i J a we ow bou the erivatives of q By iuctio, q i y = 2t i i 1 i l a i l e 2tly 1 + e 2ty l+1, l=1 g P1 g Pi x, imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

36 T B BERRETT, R J SAMWORTH AND M YUAN where for each i N, we have a i 1 = 1, a i i = i! a a i l = la i 1 l + a i 1 l 1 for l 2,, i 1} Sice max 1 l i a i l 2i i 1 agai by iuctio, we euce that 55 1 + e 2ty q i y 2 2i 1 i i t i Writig s := carj, it follows that 56 q g J x q gx s 2 2i 1 i i t i Ss, iafx im+1 Dg i i=1 q gx s s+1 2 2s 1 max1, t s B s afx sm+1 D s g, where B s := s i=1 Ss, i eotes the sth Bell umber We ca ow apply the multivariate Leibiz rule, so that for a multi-iex ω = ω 1,, ω with ω m, a for t 1 a m 1, ω f t,g x x ω = 2ct ω ν q g x ω ν fx ν x ν x ω ν 57 ν:ν ω 2 3m 1 m m+1 B m D m g afxm2 +m f t,g x Now, i orer to cotrol ω f t,g y x ω f t,g x ω x, we first ote that by 53 ω a 54, we have for y x r a x, i N, J J m with carj = s a P 1,, P i } P i J, 58 g P 1 g P i y g P 1 g P i x 2D i afx im+1 y x β m Thus, by 55, 58, the mea value theorem a Lemma 2, for t 1, y x r a x a m 1, q g J y q g J x s q i g x i=1 P 1,,P i } P i J s + q i g y q i g x} i=1 D m qg xafx m2 +m+1 y x β m gp 1 gp i y gp 1 gp i x} P 1,,P i } P i J gp 1 gp i y B m2 3m+5 1/2 m + 1 m+1 1 + e 2tg x e 2tg x + e 2t g y g x 59 D m qg xafx m2 +m+1 y x β m B m 2 3m+5 1/2 m + 1 m+1 56 2t 41 imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

EFFICIENT ENTROPY ESTIMATION 37 Here, to obtai the fial iequality, we use the fact that 1+a b+a 1 b for a, b > 0 a b < 1, a the fact that e 2t g y g fy 2t, fx } 2t x = max 1+afx y x β 1} 2t 1 fx fy for y x r a x Usig the multivariate Leibitz rule agai, together with 56, 59 a Lemma 2, for t 1, y x r a x a ω = m 1, ω f t,g y x ω ω f t,g x x ω 2ct ω ω ν fy ν qg y ν y ω ν x ν ν qg x x ν ν:ν ω + ν qg x ν } fy x ν x ν ν fx x ν 60 2 4m+9 1/2 B m m + 1 m+1 D m afx m2 +m+1 f t,g x y x β m =: C md m afx m2 +m+1 f t,g x y x β m This also hols i the case m = 0 Now ote that if 12t < X g 2 f} 1/2 we have Fially, efie the fuctio fx = 1 + e 2tg x 2ct f t,g x f t,g x 4 61 ãδ := m/2 C md m aδ/4 m2 +m+1 The ã A a from 57 a 60, we have M ft,g,ã,βx ãf t,g x We coclue that for t < mi 1, 144 g 2 f} 1/2, we have that f t,g F,θ, where θ = α, β, 4γ, 4ν, ã Θ The result follows o otig that f t,gλ = f tλ,g Refereces Berrett, T B, Samworth, R J a Yua, M 2017 Efficiet multivariate etropy estimatio via k-earest eighbour istaces Submitte Götze, F 1991 O the rate of covergece i the multivariate CLT A Prob, 19, 724 739 va er Vaart, A W 1998 Asymptotic Statistics Cambrige Uiversity Press, Cambrige imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018

38 T B BERRETT, R J SAMWORTH AND M YUAN Statistical Laboratory Wilberforce Roa Cambrige CB3 0WB Uite Kigom E-mail: rsamworth@statslabcamacuk E-mail: tberrett@statslabcamacuk URL: http://wwwstatslabcamacuk/ rjs57 URL: http://wwwstatslabcamacuk/ tbb26 Departmet of Statistics Uiversity of Wiscosi Maiso Meical Scieces Ceter 1300 Uiversity Aveue Maiso, WI 53706 Uite States of America E-mail: myua@statwisceu URL: http://pagesstatwisceu/ myua/ imsart-aos ver 2014/10/16 file: WKLAoSFialSupptex ate: Jauary 28, 2018