Thanks. Presentation help: B. Narasimhan, N. El Karoui, J-M Corcuera Grant Support: NIH, NSF, ARC (via P. Hall) p.2

Size: px
Start display at page:

Download "Thanks. Presentation help: B. Narasimhan, N. El Karoui, J-M Corcuera Grant Support: NIH, NSF, ARC (via P. Hall) p.2"

Transcription

1 p.1

2 Collaborators: Felix Abramowich Yoav Benjamini David Donoho Noureddine El Karoui Peter Forrester Gérard Kerkyacharian Debashis Paul Dominique Picard Bernard Silverman Thanks Presentation help: B. Narasimhan, N. El Karoui, J-M Corcuera Grant Support: NIH, NSF, ARC (via P. Hall) p.2

3 Abraham Wald p.3

4 Three Talks 1. Function Estimation & Classical Normal Theory X n N p(n) (θ n,i) p(n) with n (MVN) 2. The Threshold Selection Problem In (MVN) with, say, ˆθ i = X i I{ X i > ˆt} How to select ˆt = ˆt(X) reliably? 3. LargeCovarianceMatrices X n N p(n) (I Σ p(n) ); especially X n = [ Y n Z n ] spectral properties of n 1 X n X T n PCA, CCA, MANOVA p.4

5 Focus on Gaussian Models Cue from Wald (1943), Trans. Amer. Math. Soc. -an inspiration for Le Cam s theory of Local Asymptotic Normality (LAN) p.5

6 Focus on Gaussian Models, ctd. Growing models : p n or p n is now commonplace (classification, genomics..) Reality check: But, no real data is Gaussian... Yes (in many fields), and yet, consider the power of fable and fairy tale... p.6

7 1. Function Estimation and Classical Normal Theory Theme: Gaussian White Noise model & multiresolution point of view allow classical parametric theory of N d (θ, I) to be exploited in nonparametric function estimation Reference: book manuscript in (non-)progress: Function Estimation and Gaussian Sequence Models www-stat.stanford.edu/~imj p.7

8 Agenda Classical Parametric Ideas Nonparametric Estimation and Growing Gaussian Models I. Kernel Estimation and James-Stein Shrinkage II. Thresholding and Sparsity III. Bernstein-von Mises phenomenon p.8

9 Classical Ideas a) Multinormal shift model X 1,...,X n data from P η (dx) =f η (x)µ(dx), η H R d. Let I 0 = Fisher information matrix at η 0. Local asymptotic Gaussian approximation: {P n η 0 +θ/ n, θ Rd } {N d (θ, I 1 0 ), θ Rd } p.9

10 Classical Ideas, ctd b) ANOVA, Projections and Model Selection Y = Xβ n 1 n d d 1 + σɛ Submodels: X =[X 0 X 1 ] projections P X0. Canonical form: y = θ + σz. Projections: (P 0 y) i = { y i i I 0 0 o/w MSE: E P 0 y θ 2 = i I 0 σ 2 + i/ I 0 θ 2 i p.10

11 Classical Ideas, ctd c) Minimax estimation of θ inf ˆθ sup E θ ˆθ(y) θ 2 = dσ 2 θ R d attained by MLE ˆθ MLE (y) =y. d) James-Stein estimate ˆθ JS (y) = ( 1 d 2 ) y 2 y dominates MLE if d 3. p.11

12 Classical Ideas, ctd e) Conjugate priors Prior: θ N(0,τ 2 I), Likelihood: y θ N(θ, σ 2 I) Posterior: θ y N(ˆθ y,σ 2 y) ˆθ y = τ 2 σ 2 + τ 2 y σ2 y = σ2 τ 2 σ 2 + τ 2 f) Unbiased risk estimate Y N d (θ, I). Forg (weakly) differentiable E Y + g(y ) θ 2 = E[d +2 T g(y )+ g(y ) 2 ]=E[U(Y )] If g = g(y ; t) then U = U(Y ; t). p.12

13 Agenda Classical Parametric Ideas Nonparametric Estimation and Growing Gaussian Models I. Kernel Estimation and James-Stein Shrinkage II. Thresholding and Sparsity III. Bernstein-von Mises phenomenon p.13

14 Nonparametric function estimation Hodges and Fix (1951) nonparametric classification spectrum estimation in time series, kernel methods for density est n, regression roughness penalty methods (splines) techniques largely differ from parametric normal theory Ibragimov & Hasminskii (and school): importance of Gaussian white noise (GWN) model: Y t = t 0 f(s)ds + ɛw t, 0 t 1 p.14

15 Motivation for GWN model GWN model emerges as large-sample limit of equispaced regression, density estimation, spectrum estimation,... y j = f(j/n)+σw j j =1,...,n ɛ = σ/ n p.15

16 Motivation for GWN model GWN model emerges as large-sample limit of equispaced regression, density estimation, spectrum estimation,... y j = f(j/n)+σw j j =1,...,n ɛ = σ/ n 1 n [nt] j=1 y j = 1 n [nt] j=1 f(j/n)+ σ [nt] 1 n n j=1 w j p.15

17 Motivation for GWN model GWN model emerges as large-sample limit of equispaced regression, density estimation, spectrum estimation,... y j = f(j/n)+σw j j =1,...,n ɛ = σ/ n 1 n [nt] j=1 y j = 1 n [nt] j=1 f(j/n)+ σ [nt] 1 n n j=1 w j Y t = t 0 f(s)ds + σ n W t p.15

18 Series form of WN Model Y t = t 0 f(s)ds + ɛw t For any orthonormal basis {ψ λ,λ Λ} for L 2 [0, 1], ψ λ dy t = ψ λ fdt+ ɛ ψ λ dw t y λ = θ λ + ɛz λ, (z λ ) i.i.d. N(0, 1), λ Λ Parseval relation implies ( ˆf f) 2 = (ˆθ λ θ λ ) 2 = ˆθ θ 2, etc. analysis of infinite sequences in l 2 (N) p.16

19 Multiresolution connection Wavelet orthonormal bases {ψ jk } have double index level ( octave ) j =1, 2,..., location k =1,...,2 j Collect coefficients in a single level: y j =(y jk ) θ j =(θ jk ) k =1,...,2 j etc. Finite (but growing with j) multivariate normal model y j N 2 j(θ j,ɛ 2 I), j =1, 2,... apply classical normal theory to the vectors y j. p.17

20 Some S8 Symmlets at various scales and locations 16 (8,202) (8,166) (8,102) (8,81) (8,31) (7,101) (7,77) (7,51) (6,42) (6,34) (6,26) (6,12) (4,11) (4, 8) (4, 3) p.18

21 Example 600 Electrical Signal Nationwide electricity consumption in France Here: 3 days (Thur, Fri, Sat), summer 1990, t =1min. from Misiti et. al., Rev. Statist. Appliquée 1994, N.B. Malfunction of meters: d. 2, d. 3, p.19

22 450 Electrical Signal CDJV Wavelet Coeffs Often o.k. to treat a single level as N 2 j(θ j,ɛ 2 I) p.20

23 350 Electrical Signal CDJV Wavelet Coeffs Or, apply N d (θ, I) model to a block within a level. p.21

24 The Minimax principle Given loss function L(a, θ), and risk R(ˆθ, θ) =E θ L(ˆθ, θ) choose estimator ˆθ so as to minimize the maximum risk E θ L(ˆθ, θ). sup θ Θ introduced into statistics by Wald L.J. Savage (1951): the only rule of comparable generality proposed since [that of] Bayes was published in 1763 standard complaint: ˆθ depends on Θ: optimizing on the worst case may be irrelevant p.22

25 Simultaneous near-minimaxity Shift in perspective on minimaxity fix ˆf in advance; an estimator to be evaluated for many spaces F, compare sup F R( ˆf,f) to R(F) =inf ˆf sup F R( ˆf,f) shift from to Exact answer to wrong problem Approx answer to many (related) problems an imperfect, yet serviceable tool p.23

26 Agenda Classical Parametric Ideas Nonparametric Estimation and Growing Gaussian Models I. Kernel Estimation and James-Stein Shrinkage II. Thresholding and Sparsity III. Bernstein-von Mises phenomenon p.24

27 . Kernel estimation & James-Stein Shrinkage Priestley-Chao estimator: ˆf h (t) = 1 Nh N i=1 ( t ti ) K Y i e.g. K(x) =(1 x 2 ) 4 + h Automatic choice of h? Huge literature, (e.g. Wand & Jones, 1995) James-Stein provides simple, powerful approach p.25

28 Fourier Form of Kernel Smoothing Convolution multiplication Shrinkage factors s k (h) [0, 1] DFT (shrink) decrease with frequency idft Flatten shrinkage in blocks: ˆθ k =(1 w j )y k How to choose w j? k B j frequency p.26

29 James-Stein Shrinkage For y N d (θ, I), andd 3 ( ˆθ JS+ = 1 d 2+η ) y 2 y + Unbiased estimate of risk: E ˆθ JS θ 2 = E θ { d (d 2)2 + η 2 y 2 } d p.27

30 Smoothing via Blockwise James-Stein Block Fourier or wavelet (here) coefficients: y j =(y jk ), k =1,...,d j =2 j ; θ j =(θ jk ) etc. Apply J-S shrinkage to each block: [refs] ˆθ JS+ j = (1 β jɛ 2 y j 2 ) + y j 2 j log 2 N β j = { d j 2 d j + 2d j 2logdj (ExactJS) or (ConsJS) For ConsJS, ifθ j =0,thenP { y j 2 <β j } 1, soˆθ JS j =0. p.28

31 Block James-Stein imitates Kernel p.29

32 Block James-Stein imitates Kernel Red: James Stein equivalent kernels; Blue: Actual Kernel p.30

33 Properties of Block James-Stein Block JS imitates kernel smoothing, but near automatic choice of bandwidth canonical choices of β j Easy theoretical analysis from unbiased risk bounds E ˆθ JS j θ j 2 2ɛ 2 + θ j 2 2 j ɛ 2 p.31

34 Properties of Block James-Stein Block JS imitates kernel smoothing, but near automatic choice of bandwidth canonical choices of β j Easy theoretical analysis from unbiased risk bounds E ˆθ JS j θ j 2 2ɛ 2 + θ j 2 2 j ɛ 2 e.g. below: MSE for Hölder smooth functions: 0 <δ<1 f(x) f(y) B x y δ all x, y (*) α = r + δ, r N, D r f satisfies (*). In terms of wavelet coefficients: f H α (C) θ jk C2 (α+1/2)j for all j, k p.31

35 A single level determines MSE From unbiased risk bounds and Hölder smoothness: E ˆθ JS j θ j 2 [2ɛ 2 + θ j 2 2 j ɛ 2 ] f H α (C) θ j 2 C 2 2 2αj p.32

36 A single level determines MSE From unbiased risk bounds and Hölder smoothness: E ˆθ j JS θ j 2 [2ɛ 2 + θ j 2 2 j ɛ 2 ]+ j j J j>j f H α (C) θ j 2 C 2 2 2αj θ j 2 MSE p.32

37 Growing Gaussians MSE geometric decay of MSE away from critical j Growing Gaussian aspect: As noise ɛ = σ/ n) decreases, worstlevelj = j (ɛ, C) increases: d j =2 j = c(c/ɛ) 2/(2α+1) p.33

38 MultiJS is rate-adaptive The optimal rate for H α (C) is ɛ 2r,withr = r(α) = 2α 2α+1. JS risk bounds imply simultaneous near-minimaxity: sup R( ˆf JS,f) cc 2(1 r) ɛ 2r (1 + o(1)) R(H α (C)). H α (C) For a single, prespecified estimator ˆf JS, valid for all smoothness α (0, ), bounds C (0, ). No speed limit to rate of convergence infinite order kernel (for certain wavelets) Conclusion follows easily from single level James-Stein analysis in multinormal mean model p.34

39 Agenda Classical Parametric Ideas Nonparametric Estimation and Growing Gaussian Models I. Kernel Estimation and James-Stein Shrinkage II. Thresholding and Sparsity III. Bernstein-von Mises phenomenon p.35

40 James-Stein Fails on Sparse Signals d θ 2 d + θ 2 R(ˆθ JS,θ) 2+ d θ 2 d + θ 2 r spike θ r : r co-ords at d/r θ 2 = d So R(ˆθ JS,θ r ) d/2. p.36

41 James-Stein Fails on Sparse Signals d θ 2 d + θ 2 R(ˆθ JS,θ) 2+ d θ 2 d + θ 2 r spike θ r : r co-ords at d/r θ 2 = d So R(ˆθ JS,θ r ) d/2.... Hard thresholding at t = 2logd: R(ˆθ HT,θ r ) d 2tφ(t)+r [1 + η(d/r)] r +3 2logd p.36

42 A more systematic story for thresholding Based on l p norms and balls: θ p p = 1. l p norms capture sparsity d i=1 (mostly with p<2) θ i p 2. thresholding arises from least squares estimation with l p constraints 3. describe best possible estimation over l p balls 4. show that thresholding (nearly) attains the best possible p.37

43 l p norms, p<2, capture sparsity ( ) l 1 ball smaller than l 2, expect better estimation p.38

44 l p norms, p<2, capture sparsity ( ) l 1 ball smaller than l 2, expect better estimation Sparsity of representation depends on basis: rotation sends (0,C,0,...) (C/ d,...,c/ d) p.38

45 Thresholding from constrained least squares min (y i θ i ) 2 s.t. θi p C p i.e. min (y i θ i ) 2 + λ θ i p leads to p =2 Linear Shrinkage ˆθi =(1+λ) 1 y i y i λ y i >λ p =1 Soft thresholding ˆθi = 0 y i λ y i + λ y i < λ [λ = λ/2] p =0 Hard thresholding ˆθi = y i I{ y i >λ}. [penaltyλ I{θ i 0} ] p.39

46 Bounds on best estimation on l p balls Minimax risk: R d,p (C) =infˆθ sup θ p C E θ d 1 (ˆθ i θ i ) 2. Non-asymptotic bounds: [ Birgé-Massart] c 1 r d,p (C) R d,p (C) c 2 [log d + r d,p (C)] l 1 ball smaller (much) smaller minimax risk p.40

47 Bounds on best estimation on l p balls Minimax risk: R d,p (C) =infˆθ sup θ p C E θ d 1 (ˆθ i θ i ) 2. Non-asymptotic bounds: [ Birgé-Massart] c 1 r d,p (C) R d,p (C) c 2 [log d + r d,p (C)] bound for thresholding l 1 ball smaller (much) smaller minimax risk p.40

48 Thresholding nearly attains the bound For simplicity: threshold t = 2logd Oracle inequality for soft thresholding: [non-asymptotic!] E ˆθ ST θ 2 (2 log d +1)[1+ (θ 2 i 1)] Apply to l 1 ball {θ : θ 1 C}. sup θ 1 C E ˆθ ST θ 2 (2 log d +1)(C +1). Better bounds possible for data-dependent thresholds t = t(y ) (Lec. 2) p.41

49 Summary for N d (θ, I) For X N d (θ, I), comparing James Stein shrinkage (β = d 2), and soft thresholding at t = 2logd: James-Stein shrinkage: orthogonally invariant: 1 2 ( θ 2 d) E ˆθ JS θ 2 2+( θ 2 d) Thresholding: co-ordinatewise, and co-ordinate dependent: 1 2 (θ 2 i 1) E ˆθ ST θ 2 (2 log d + 1)(1 + (θ 2 i 1)) p.42

50 Implications for Function Estimation Key example: functions of bounded total variation: TV(C) ={f : f TV C} f TV = sup t 1 <...<t N N 1 i=1 f(t i+1 ) f(t i ) + f 1 Well captured by weighted combinations of l 1 norms on wavelet coefficients: c 1 sup 2 j/2 θ j 1 f TV c 2 2 j/2 θ j 1 j Best possible (minimax) MSE R(TV(C),ɛ)=inf f sup E ˆf f 2 f TV(C) j p.43

51 Reduction to single levels sup θ E ˆθ θ 2 = j sup θ j E ˆθ j θ j 2 Apply thresholding bounds for each j: ton 2 j(θ j,ɛ 2 I). a worst case level j = j (ɛ, C), Geometric decay of sup j E ˆθ j θ j 2 as j j. Growing Gaussian aspect: As noise ɛ = σ/ n) decreases, worst level j = j (ɛ, C) increases: d j =2 j = c(c/ɛ) 2/3 p.44

52 Final Result sup R( ˆf thr,f) C 2(1 r) ɛ 2r r =2/3 TV(C) sup R( ˆf JS,f) C 2(1 rl) ɛ 2r L r L =1/2 TV(C) 20 Blocks signal 25 Noisy version \sqrt 2 log d threshold 25 Block JS shrinkage p.45

53 Agenda Classical Parametric Ideas Nonparametric Estimation and Growing Gaussian Models I. Kernel Estimation and James-Stein Shrinkage II. Thresholding and Sparsity III. Bernstein-von Mises phenomenon p.46

54 Bernstein-von Mises Phenomenon Asymptotic match of freq. & Bayesian confidence intervals: Classical version Y 1,...,Y n i.i.d. p θ (y)dµ θ Θ R d d fixed Under mild conditions, as n, P θ Y N (ˆθMLE, n 1 I 1 ) θ 0 P n,θ0 0, where I θ = E θ [ θ log p θ ][ θ log p θ ] T is Fisher information, P Q =max A P (A) Q(A) is total variation distance p.47

55 Non-parametric Regression Y i = f(i/n)+σ 0 w i, i =1,...,n For typical smoothness priors (e.g. d th integrated Wiener process prior) Bernstein - von Mises fails [Dennis Cox, D. A. Freedman] frequentist & posterior laws of ˆf f 2 2 center and scale mismatch in Here, revisit via elementary Growing Gaussian Model approach Apply to single levels of wavelet decomposition p.48

56 Growing Gaussian Model Data: Y 1,...,Y n θ i.i.d. N p (θ, σ 2 0 I) Prior: θ N p(0,τ 2 n). growing: p = p(n) with n σ 2 n = Var(Ȳn θ) =σ 2 0 /n; τ 2 n may depend on n. Goal: compare L(ˆθ MLE θ), L(ˆθ Bayes θ) with L(θ Y ). Posterior: All Gaussian: centering and scaling are key: L ( θ Ȳ =ȳ) N p ( ˆθB = w n ȳ, w n σ 2 ni ) w n = τ 2 n σ 2 n + τ 2 n p.49

57 Growing Gaussians and BvM Correspondences: P θ Y N p (w n ȳ, w n σni 2 ) N (ˆθMLE,n 1 I 1 ) θ 0 N p ( ȳ, σni 2 ) For BvM to hold, now need w n 1 sufficiently fast: Proposition: In the growing Gaussian model if and only if Pθ Y N (ˆθMLE,n 1 I 1 θ 0 ) P n,θ0 0, p σ 2 n τ 2 n = p n σ 2 0 τ 2 n 0 ( 1 i.e. w n =1 o ). pn p.50

58 Example: Pinsker priors Back to regression: dy t = f(t)dt + σ n dw t Minimax MSE linear estimation of f: {f : 1 0 (D α f) 2 C 2 } Least favourable prior on wavelet coeffs (for sample size n): θ jk indep N(0,τ 2 j ) τj 2 = σn(λ 2 n 2 jα 1) + λ n = c(c/σ n ) 2α/(2α+1) critical level j = j (n, α) grows with n (Growing Gaussian model again) p.51

59 Validity of B-vM depends on level Bayes estimator for the Pinsker prior attains exact minimax MSE (asymptotically) But Bernstein von Mises fails at the critical level j (n): At j (n) τ 2 j /σ 2 n 2α 1 [fine scale features] 1 w n = 1 1+τ 2 j /σ2 n 2 α BvM fails At fixed j 0 : p =2 j 0 FIXED (or slowly ) [coarse scale] τj 2 0 /σn 2 = λ n 2 j0α 1 w n 0 BvM holds Difficulty lies with high dimensional features. p.52

60 Three Talks 1. Function Estimation & Classical Normal Theory X n N p(n) (θ n,i) p(n) with n (MVN) 2. The Threshold Selection Problem In (MVN) with, say, ˆθ i = X i I{ X i > ˆt} How to select ˆt = ˆt(X) reliably? 3. LargeCovarianceMatrices X n N p(n) (I Σ p(n) ); especially X n = [ Y n Z n ] spectral properties of n 1 X n X T n PCA, CCA, MANOVA p.53

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric

More information

The Threshold Selection Problem

The Threshold Selection Problem . p.1 The Threshold Selection Problem Given data X i N(µ i, 1), i =1,...,n some threshold rule, e.g. { X i X i t ˆθ i = 0 X i

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

arxiv: v4 [stat.me] 27 Nov 2017

arxiv: v4 [stat.me] 27 Nov 2017 CLASSIFICATION OF LOCAL FIELD POTENTIALS USING GAUSSIAN SEQUENCE MODEL Taposh Banerjee John Choi Bijan Pesaran Demba Ba and Vahid Tarokh School of Engineering and Applied Sciences, Harvard University Center

More information

Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach

Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1999 Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach T. Tony Cai University of Pennsylvania

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A Bayesian Testimation Approach

On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A Bayesian Testimation Approach Sankhyā : The Indian Journal of Statistics 2011, Volume 73-A, Part 2, pp. 231-244 2011, Indian Statistical Institute On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).

More information

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der

More information

Minimax Risk: Pinsker Bound

Minimax Risk: Pinsker Bound Minimax Risk: Pinsker Bound Michael Nussbaum Cornell University From: Encyclopedia of Statistical Sciences, Update Volume (S. Kotz, Ed.), 1999. Wiley, New York. Abstract We give an account of the Pinsker

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

Asymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression

Asymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression Asymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression T. Tony Cai 1 and Harrison H. Zhou 2 University of Pennsylvania and Yale University Abstract Asymptotic equivalence theory

More information

Gaussian estimation: Sequence and multiresolution models

Gaussian estimation: Sequence and multiresolution models Gaussian estimation: Sequence and multiresolution models Draft version, March 8, 2011 Iain M. Johnstone c2011. Iain M. Johnstone iii Contents (working) Preface viii 1 Introduction 1 1.1 A comparative

More information

Weak convergence of Markov chain Monte Carlo II

Weak convergence of Markov chain Monte Carlo II Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

An Introduction to Wavelets and some Applications

An Introduction to Wavelets and some Applications An Introduction to Wavelets and some Applications Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France An Introduction to Wavelets and some Applications p.1/54

More information

Wavelet Shrinkage for Nonequispaced Samples

Wavelet Shrinkage for Nonequispaced Samples University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Wavelet Shrinkage for Nonequispaced Samples T. Tony Cai University of Pennsylvania Lawrence D. Brown University

More information

A Data-Driven Block Thresholding Approach To Wavelet Estimation

A Data-Driven Block Thresholding Approach To Wavelet Estimation A Data-Driven Block Thresholding Approach To Wavelet Estimation T. Tony Cai 1 and Harrison H. Zhou University of Pennsylvania and Yale University Abstract A data-driven block thresholding procedure for

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

3.0.1 Multivariate version and tensor product of experiments

3.0.1 Multivariate version and tensor product of experiments ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Likelihood inference in the presence of nuisance parameters

Likelihood inference in the presence of nuisance parameters Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto www.utstat.utoronto.ca/reid/research 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference

More information

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Collaborators:

More information

Comments on \Wavelets in Statistics: A Review" by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill

Comments on \Wavelets in Statistics: A Review by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill Comments on \Wavelets in Statistics: A Review" by A. Antoniadis Jianqing Fan University of North Carolina, Chapel Hill and University of California, Los Angeles I would like to congratulate Professor Antoniadis

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

Gaussian estimation: Sequence and multiresolution models

Gaussian estimation: Sequence and multiresolution models Gaussian estimation: Sequence and multiresolution models Draft version, August 29, 2011 Iain M. Johnstone c2011. Iain M. Johnstone iii Contents (working) Preface ix 1 Introduction 1 1.1 A comparative

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Nonparametric Regression

Nonparametric Regression Adaptive Variance Function Estimation in Heteroscedastic Nonparametric Regression T. Tony Cai and Lie Wang Abstract We consider a wavelet thresholding approach to adaptive variance function estimation

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973] Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

ADAPTIVE CONFIDENCE BALLS 1. BY T. TONY CAI AND MARK G. LOW University of Pennsylvania

ADAPTIVE CONFIDENCE BALLS 1. BY T. TONY CAI AND MARK G. LOW University of Pennsylvania The Annals of Statistics 2006, Vol. 34, No. 1, 202 228 DOI: 10.1214/009053606000000146 Institute of Mathematical Statistics, 2006 ADAPTIVE CONFIDENCE BALLS 1 BY T. TONY CAI AND MARK G. LOW University of

More information

A DATA-DRIVEN BLOCK THRESHOLDING APPROACH TO WAVELET ESTIMATION

A DATA-DRIVEN BLOCK THRESHOLDING APPROACH TO WAVELET ESTIMATION The Annals of Statistics 2009, Vol. 37, No. 2, 569 595 DOI: 10.1214/07-AOS538 Institute of Mathematical Statistics, 2009 A DATA-DRIVEN BLOCK THRESHOLDING APPROACH TO WAVELET ESTIMATION BY T. TONY CAI 1

More information

Default priors and model parametrization

Default priors and model parametrization 1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)

More information

Shrinkage and Denoising by Minimum Message Length

Shrinkage and Denoising by Minimum Message Length 1 Shrinkage and Denoising by Minimum Message Length Daniel F. Schmidt and Enes Makalic Abstract This paper examines orthonormal regression and wavelet denoising within the Minimum Message Length (MML framework.

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show

More information

Research Statement. Harrison H. Zhou Cornell University

Research Statement. Harrison H. Zhou Cornell University Research Statement Harrison H. Zhou Cornell University My research interests and contributions are in the areas of model selection, asymptotic decision theory, nonparametric function estimation and machine

More information

Sparsity and the truncated l 2 -norm

Sparsity and the truncated l 2 -norm Lee H. Dicker Department of Statistics and Biostatistics utgers University ldicker@stat.rutgers.edu Abstract Sparsity is a fundamental topic in highdimensional data analysis. Perhaps the most common measures

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Statistics Ph.D. Qualifying Exam

Statistics Ph.D. Qualifying Exam Department of Statistics Carnegie Mellon University May 7 2008 Statistics Ph.D. Qualifying Exam You are not expected to solve all five problems. Complete solutions to few problems will be preferred to

More information

An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems

An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems Justin Romberg Georgia Tech, School of ECE ENS Winter School January 9, 2012 Lyon, France Applied and Computational

More information

Carl N. Morris. University of Texas

Carl N. Morris. University of Texas EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province

More information

Chi-square lower bounds

Chi-square lower bounds IMS Collections Borrowing Strength: Theory Powering Applications A Festschrift for Lawrence D. Brown Vol. 6 (2010) 22 31 c Institute of Mathematical Statistics, 2010 DOI: 10.1214/10-IMSCOLL602 Chi-square

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

The Root-Unroot Algorithm for Density Estimation as Implemented. via Wavelet Block Thresholding

The Root-Unroot Algorithm for Density Estimation as Implemented. via Wavelet Block Thresholding The Root-Unroot Algorithm for Density Estimation as Implemented via Wavelet Block Thresholding Lawrence Brown, Tony Cai, Ren Zhang, Linda Zhao and Harrison Zhou Abstract We propose and implement a density

More information

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

Sparse linear models

Sparse linear models Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time

More information

OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE

OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE 17th European Signal Processing Conference (EUSIPCO 009) Glasgow, Scotland, August 4-8, 009 OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE Abdourrahmane M. Atto 1, Dominique Pastor, Gregoire Mercier

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

A Lower Bound Theorem. Lin Hu.

A Lower Bound Theorem. Lin Hu. American J. of Mathematics and Sciences Vol. 3, No -1,(January 014) Copyright Mind Reader Publications ISSN No: 50-310 A Lower Bound Theorem Department of Applied Mathematics, Beijing University of Technology,

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

ON BLOCK THRESHOLDING IN WAVELET REGRESSION: ADAPTIVITY, BLOCK SIZE, AND THRESHOLD LEVEL

ON BLOCK THRESHOLDING IN WAVELET REGRESSION: ADAPTIVITY, BLOCK SIZE, AND THRESHOLD LEVEL Statistica Sinica 12(2002), 1241-1273 ON BLOCK THRESHOLDING IN WAVELET REGRESSION: ADAPTIVITY, BLOCK SIZE, AND THRESHOLD LEVEL T. Tony Cai University of Pennsylvania Abstract: In this article we investigate

More information

19.1 Maximum Likelihood estimator and risk upper bound

19.1 Maximum Likelihood estimator and risk upper bound ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 19: Denoising sparse vectors - Ris upper bound Lecturer: Yihong Wu Scribe: Ravi Kiran Raman, Apr 1, 016 This lecture

More information

group. redshifts. However, the sampling is fairly complete out to about z = 0.2.

group. redshifts. However, the sampling is fairly complete out to about z = 0.2. Non parametric Inference in Astrophysics Larry Wasserman, Chris Miller, Bob Nichol, Chris Genovese, Woncheol Jang, Andy Connolly, Andrew Moore, Jeff Schneider and the PICA group. 1 We discuss non parametric

More information

(a). Bumps (b). Wavelet Coefficients

(a). Bumps (b). Wavelet Coefficients Incorporating Information on Neighboring Coecients into Wavelet Estimation T. Tony Cai Bernard W. Silverman Department of Statistics Department of Mathematics Purdue University University of Bristol West

More information

Bayesian Asymptotics

Bayesian Asymptotics BS2 Statistical Inference, Lecture 8, Hilary Term 2008 May 7, 2008 The univariate case The multivariate case For large λ we have the approximation I = b a e λg(y) h(y) dy = e λg(y ) h(y ) 2π λg (y ) {

More information

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE

ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE Statistica Sinica 9 2009, 885-903 ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE Lawrence D. Brown and Linda H. Zhao University of Pennsylvania Abstract: Many multivariate Gaussian models

More information

Penalty Methods for Bivariate Smoothing and Chicago Land Values

Penalty Methods for Bivariate Smoothing and Chicago Land Values Penalty Methods for Bivariate Smoothing and Chicago Land Values Roger Koenker University of Illinois, Urbana-Champaign Ivan Mizera University of Alberta, Edmonton Northwestern University: October 2001

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

Gaussian estimation: Sequence and wavelet models

Gaussian estimation: Sequence and wavelet models Gaussian estimation: Sequence and wavelet models Draft version, September 8, 2015 Comments welcome Iain M. Johnstone c2015. Iain M. Johnstone iii Contents (working) Preface List of Notation ix xiv 1 Introduction

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute

More information

Non linear estimation in anisotropic multiindex denoising II

Non linear estimation in anisotropic multiindex denoising II Non linear estimation in anisotropic multiindex denoising II Gérard Kerkyacharian, Oleg Lepski, Dominique Picard Abstract In dimension one, it has long been observed that the minimax rates of convergences

More information

with Current Status Data

with Current Status Data Estimation and Testing with Current Status Data Jon A. Wellner University of Washington Estimation and Testing p. 1/4 joint work with Moulinath Banerjee, University of Michigan Talk at Université Paul

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

ASSESSING A VECTOR PARAMETER

ASSESSING A VECTOR PARAMETER SUMMARY ASSESSING A VECTOR PARAMETER By D.A.S. Fraser and N. Reid Department of Statistics, University of Toronto St. George Street, Toronto, Canada M5S 3G3 dfraser@utstat.toronto.edu Some key words. Ancillary;

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples Homework #3 36-754, Spring 27 Due 14 May 27 1 Convergence of the empirical CDF, uniform samples In this problem and the next, X i are IID samples on the real line, with cumulative distribution function

More information

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2493 2511 RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 By Theodoros Nicoleris and Yannis

More information

Inverse Statistical Learning

Inverse Statistical Learning Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning

More information

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

Minimax Estimation of a nonlinear functional on a structured high-dimensional model Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Nonparametric Estimation: Part I

Nonparametric Estimation: Part I Nonparametric Estimation: Part I Regression in Function Space Yanjun Han Department of Electrical Engineering Stanford University yjhan@stanford.edu November 13, 2015 (Pray for Paris) Outline 1 Nonparametric

More information