Thanks. Presentation help: B. Narasimhan, N. El Karoui, J-M Corcuera Grant Support: NIH, NSF, ARC (via P. Hall) p.2
|
|
- Shanon Kennedy
- 6 years ago
- Views:
Transcription
1 p.1
2 Collaborators: Felix Abramowich Yoav Benjamini David Donoho Noureddine El Karoui Peter Forrester Gérard Kerkyacharian Debashis Paul Dominique Picard Bernard Silverman Thanks Presentation help: B. Narasimhan, N. El Karoui, J-M Corcuera Grant Support: NIH, NSF, ARC (via P. Hall) p.2
3 Abraham Wald p.3
4 Three Talks 1. Function Estimation & Classical Normal Theory X n N p(n) (θ n,i) p(n) with n (MVN) 2. The Threshold Selection Problem In (MVN) with, say, ˆθ i = X i I{ X i > ˆt} How to select ˆt = ˆt(X) reliably? 3. LargeCovarianceMatrices X n N p(n) (I Σ p(n) ); especially X n = [ Y n Z n ] spectral properties of n 1 X n X T n PCA, CCA, MANOVA p.4
5 Focus on Gaussian Models Cue from Wald (1943), Trans. Amer. Math. Soc. -an inspiration for Le Cam s theory of Local Asymptotic Normality (LAN) p.5
6 Focus on Gaussian Models, ctd. Growing models : p n or p n is now commonplace (classification, genomics..) Reality check: But, no real data is Gaussian... Yes (in many fields), and yet, consider the power of fable and fairy tale... p.6
7 1. Function Estimation and Classical Normal Theory Theme: Gaussian White Noise model & multiresolution point of view allow classical parametric theory of N d (θ, I) to be exploited in nonparametric function estimation Reference: book manuscript in (non-)progress: Function Estimation and Gaussian Sequence Models www-stat.stanford.edu/~imj p.7
8 Agenda Classical Parametric Ideas Nonparametric Estimation and Growing Gaussian Models I. Kernel Estimation and James-Stein Shrinkage II. Thresholding and Sparsity III. Bernstein-von Mises phenomenon p.8
9 Classical Ideas a) Multinormal shift model X 1,...,X n data from P η (dx) =f η (x)µ(dx), η H R d. Let I 0 = Fisher information matrix at η 0. Local asymptotic Gaussian approximation: {P n η 0 +θ/ n, θ Rd } {N d (θ, I 1 0 ), θ Rd } p.9
10 Classical Ideas, ctd b) ANOVA, Projections and Model Selection Y = Xβ n 1 n d d 1 + σɛ Submodels: X =[X 0 X 1 ] projections P X0. Canonical form: y = θ + σz. Projections: (P 0 y) i = { y i i I 0 0 o/w MSE: E P 0 y θ 2 = i I 0 σ 2 + i/ I 0 θ 2 i p.10
11 Classical Ideas, ctd c) Minimax estimation of θ inf ˆθ sup E θ ˆθ(y) θ 2 = dσ 2 θ R d attained by MLE ˆθ MLE (y) =y. d) James-Stein estimate ˆθ JS (y) = ( 1 d 2 ) y 2 y dominates MLE if d 3. p.11
12 Classical Ideas, ctd e) Conjugate priors Prior: θ N(0,τ 2 I), Likelihood: y θ N(θ, σ 2 I) Posterior: θ y N(ˆθ y,σ 2 y) ˆθ y = τ 2 σ 2 + τ 2 y σ2 y = σ2 τ 2 σ 2 + τ 2 f) Unbiased risk estimate Y N d (θ, I). Forg (weakly) differentiable E Y + g(y ) θ 2 = E[d +2 T g(y )+ g(y ) 2 ]=E[U(Y )] If g = g(y ; t) then U = U(Y ; t). p.12
13 Agenda Classical Parametric Ideas Nonparametric Estimation and Growing Gaussian Models I. Kernel Estimation and James-Stein Shrinkage II. Thresholding and Sparsity III. Bernstein-von Mises phenomenon p.13
14 Nonparametric function estimation Hodges and Fix (1951) nonparametric classification spectrum estimation in time series, kernel methods for density est n, regression roughness penalty methods (splines) techniques largely differ from parametric normal theory Ibragimov & Hasminskii (and school): importance of Gaussian white noise (GWN) model: Y t = t 0 f(s)ds + ɛw t, 0 t 1 p.14
15 Motivation for GWN model GWN model emerges as large-sample limit of equispaced regression, density estimation, spectrum estimation,... y j = f(j/n)+σw j j =1,...,n ɛ = σ/ n p.15
16 Motivation for GWN model GWN model emerges as large-sample limit of equispaced regression, density estimation, spectrum estimation,... y j = f(j/n)+σw j j =1,...,n ɛ = σ/ n 1 n [nt] j=1 y j = 1 n [nt] j=1 f(j/n)+ σ [nt] 1 n n j=1 w j p.15
17 Motivation for GWN model GWN model emerges as large-sample limit of equispaced regression, density estimation, spectrum estimation,... y j = f(j/n)+σw j j =1,...,n ɛ = σ/ n 1 n [nt] j=1 y j = 1 n [nt] j=1 f(j/n)+ σ [nt] 1 n n j=1 w j Y t = t 0 f(s)ds + σ n W t p.15
18 Series form of WN Model Y t = t 0 f(s)ds + ɛw t For any orthonormal basis {ψ λ,λ Λ} for L 2 [0, 1], ψ λ dy t = ψ λ fdt+ ɛ ψ λ dw t y λ = θ λ + ɛz λ, (z λ ) i.i.d. N(0, 1), λ Λ Parseval relation implies ( ˆf f) 2 = (ˆθ λ θ λ ) 2 = ˆθ θ 2, etc. analysis of infinite sequences in l 2 (N) p.16
19 Multiresolution connection Wavelet orthonormal bases {ψ jk } have double index level ( octave ) j =1, 2,..., location k =1,...,2 j Collect coefficients in a single level: y j =(y jk ) θ j =(θ jk ) k =1,...,2 j etc. Finite (but growing with j) multivariate normal model y j N 2 j(θ j,ɛ 2 I), j =1, 2,... apply classical normal theory to the vectors y j. p.17
20 Some S8 Symmlets at various scales and locations 16 (8,202) (8,166) (8,102) (8,81) (8,31) (7,101) (7,77) (7,51) (6,42) (6,34) (6,26) (6,12) (4,11) (4, 8) (4, 3) p.18
21 Example 600 Electrical Signal Nationwide electricity consumption in France Here: 3 days (Thur, Fri, Sat), summer 1990, t =1min. from Misiti et. al., Rev. Statist. Appliquée 1994, N.B. Malfunction of meters: d. 2, d. 3, p.19
22 450 Electrical Signal CDJV Wavelet Coeffs Often o.k. to treat a single level as N 2 j(θ j,ɛ 2 I) p.20
23 350 Electrical Signal CDJV Wavelet Coeffs Or, apply N d (θ, I) model to a block within a level. p.21
24 The Minimax principle Given loss function L(a, θ), and risk R(ˆθ, θ) =E θ L(ˆθ, θ) choose estimator ˆθ so as to minimize the maximum risk E θ L(ˆθ, θ). sup θ Θ introduced into statistics by Wald L.J. Savage (1951): the only rule of comparable generality proposed since [that of] Bayes was published in 1763 standard complaint: ˆθ depends on Θ: optimizing on the worst case may be irrelevant p.22
25 Simultaneous near-minimaxity Shift in perspective on minimaxity fix ˆf in advance; an estimator to be evaluated for many spaces F, compare sup F R( ˆf,f) to R(F) =inf ˆf sup F R( ˆf,f) shift from to Exact answer to wrong problem Approx answer to many (related) problems an imperfect, yet serviceable tool p.23
26 Agenda Classical Parametric Ideas Nonparametric Estimation and Growing Gaussian Models I. Kernel Estimation and James-Stein Shrinkage II. Thresholding and Sparsity III. Bernstein-von Mises phenomenon p.24
27 . Kernel estimation & James-Stein Shrinkage Priestley-Chao estimator: ˆf h (t) = 1 Nh N i=1 ( t ti ) K Y i e.g. K(x) =(1 x 2 ) 4 + h Automatic choice of h? Huge literature, (e.g. Wand & Jones, 1995) James-Stein provides simple, powerful approach p.25
28 Fourier Form of Kernel Smoothing Convolution multiplication Shrinkage factors s k (h) [0, 1] DFT (shrink) decrease with frequency idft Flatten shrinkage in blocks: ˆθ k =(1 w j )y k How to choose w j? k B j frequency p.26
29 James-Stein Shrinkage For y N d (θ, I), andd 3 ( ˆθ JS+ = 1 d 2+η ) y 2 y + Unbiased estimate of risk: E ˆθ JS θ 2 = E θ { d (d 2)2 + η 2 y 2 } d p.27
30 Smoothing via Blockwise James-Stein Block Fourier or wavelet (here) coefficients: y j =(y jk ), k =1,...,d j =2 j ; θ j =(θ jk ) etc. Apply J-S shrinkage to each block: [refs] ˆθ JS+ j = (1 β jɛ 2 y j 2 ) + y j 2 j log 2 N β j = { d j 2 d j + 2d j 2logdj (ExactJS) or (ConsJS) For ConsJS, ifθ j =0,thenP { y j 2 <β j } 1, soˆθ JS j =0. p.28
31 Block James-Stein imitates Kernel p.29
32 Block James-Stein imitates Kernel Red: James Stein equivalent kernels; Blue: Actual Kernel p.30
33 Properties of Block James-Stein Block JS imitates kernel smoothing, but near automatic choice of bandwidth canonical choices of β j Easy theoretical analysis from unbiased risk bounds E ˆθ JS j θ j 2 2ɛ 2 + θ j 2 2 j ɛ 2 p.31
34 Properties of Block James-Stein Block JS imitates kernel smoothing, but near automatic choice of bandwidth canonical choices of β j Easy theoretical analysis from unbiased risk bounds E ˆθ JS j θ j 2 2ɛ 2 + θ j 2 2 j ɛ 2 e.g. below: MSE for Hölder smooth functions: 0 <δ<1 f(x) f(y) B x y δ all x, y (*) α = r + δ, r N, D r f satisfies (*). In terms of wavelet coefficients: f H α (C) θ jk C2 (α+1/2)j for all j, k p.31
35 A single level determines MSE From unbiased risk bounds and Hölder smoothness: E ˆθ JS j θ j 2 [2ɛ 2 + θ j 2 2 j ɛ 2 ] f H α (C) θ j 2 C 2 2 2αj p.32
36 A single level determines MSE From unbiased risk bounds and Hölder smoothness: E ˆθ j JS θ j 2 [2ɛ 2 + θ j 2 2 j ɛ 2 ]+ j j J j>j f H α (C) θ j 2 C 2 2 2αj θ j 2 MSE p.32
37 Growing Gaussians MSE geometric decay of MSE away from critical j Growing Gaussian aspect: As noise ɛ = σ/ n) decreases, worstlevelj = j (ɛ, C) increases: d j =2 j = c(c/ɛ) 2/(2α+1) p.33
38 MultiJS is rate-adaptive The optimal rate for H α (C) is ɛ 2r,withr = r(α) = 2α 2α+1. JS risk bounds imply simultaneous near-minimaxity: sup R( ˆf JS,f) cc 2(1 r) ɛ 2r (1 + o(1)) R(H α (C)). H α (C) For a single, prespecified estimator ˆf JS, valid for all smoothness α (0, ), bounds C (0, ). No speed limit to rate of convergence infinite order kernel (for certain wavelets) Conclusion follows easily from single level James-Stein analysis in multinormal mean model p.34
39 Agenda Classical Parametric Ideas Nonparametric Estimation and Growing Gaussian Models I. Kernel Estimation and James-Stein Shrinkage II. Thresholding and Sparsity III. Bernstein-von Mises phenomenon p.35
40 James-Stein Fails on Sparse Signals d θ 2 d + θ 2 R(ˆθ JS,θ) 2+ d θ 2 d + θ 2 r spike θ r : r co-ords at d/r θ 2 = d So R(ˆθ JS,θ r ) d/2. p.36
41 James-Stein Fails on Sparse Signals d θ 2 d + θ 2 R(ˆθ JS,θ) 2+ d θ 2 d + θ 2 r spike θ r : r co-ords at d/r θ 2 = d So R(ˆθ JS,θ r ) d/2.... Hard thresholding at t = 2logd: R(ˆθ HT,θ r ) d 2tφ(t)+r [1 + η(d/r)] r +3 2logd p.36
42 A more systematic story for thresholding Based on l p norms and balls: θ p p = 1. l p norms capture sparsity d i=1 (mostly with p<2) θ i p 2. thresholding arises from least squares estimation with l p constraints 3. describe best possible estimation over l p balls 4. show that thresholding (nearly) attains the best possible p.37
43 l p norms, p<2, capture sparsity ( ) l 1 ball smaller than l 2, expect better estimation p.38
44 l p norms, p<2, capture sparsity ( ) l 1 ball smaller than l 2, expect better estimation Sparsity of representation depends on basis: rotation sends (0,C,0,...) (C/ d,...,c/ d) p.38
45 Thresholding from constrained least squares min (y i θ i ) 2 s.t. θi p C p i.e. min (y i θ i ) 2 + λ θ i p leads to p =2 Linear Shrinkage ˆθi =(1+λ) 1 y i y i λ y i >λ p =1 Soft thresholding ˆθi = 0 y i λ y i + λ y i < λ [λ = λ/2] p =0 Hard thresholding ˆθi = y i I{ y i >λ}. [penaltyλ I{θ i 0} ] p.39
46 Bounds on best estimation on l p balls Minimax risk: R d,p (C) =infˆθ sup θ p C E θ d 1 (ˆθ i θ i ) 2. Non-asymptotic bounds: [ Birgé-Massart] c 1 r d,p (C) R d,p (C) c 2 [log d + r d,p (C)] l 1 ball smaller (much) smaller minimax risk p.40
47 Bounds on best estimation on l p balls Minimax risk: R d,p (C) =infˆθ sup θ p C E θ d 1 (ˆθ i θ i ) 2. Non-asymptotic bounds: [ Birgé-Massart] c 1 r d,p (C) R d,p (C) c 2 [log d + r d,p (C)] bound for thresholding l 1 ball smaller (much) smaller minimax risk p.40
48 Thresholding nearly attains the bound For simplicity: threshold t = 2logd Oracle inequality for soft thresholding: [non-asymptotic!] E ˆθ ST θ 2 (2 log d +1)[1+ (θ 2 i 1)] Apply to l 1 ball {θ : θ 1 C}. sup θ 1 C E ˆθ ST θ 2 (2 log d +1)(C +1). Better bounds possible for data-dependent thresholds t = t(y ) (Lec. 2) p.41
49 Summary for N d (θ, I) For X N d (θ, I), comparing James Stein shrinkage (β = d 2), and soft thresholding at t = 2logd: James-Stein shrinkage: orthogonally invariant: 1 2 ( θ 2 d) E ˆθ JS θ 2 2+( θ 2 d) Thresholding: co-ordinatewise, and co-ordinate dependent: 1 2 (θ 2 i 1) E ˆθ ST θ 2 (2 log d + 1)(1 + (θ 2 i 1)) p.42
50 Implications for Function Estimation Key example: functions of bounded total variation: TV(C) ={f : f TV C} f TV = sup t 1 <...<t N N 1 i=1 f(t i+1 ) f(t i ) + f 1 Well captured by weighted combinations of l 1 norms on wavelet coefficients: c 1 sup 2 j/2 θ j 1 f TV c 2 2 j/2 θ j 1 j Best possible (minimax) MSE R(TV(C),ɛ)=inf f sup E ˆf f 2 f TV(C) j p.43
51 Reduction to single levels sup θ E ˆθ θ 2 = j sup θ j E ˆθ j θ j 2 Apply thresholding bounds for each j: ton 2 j(θ j,ɛ 2 I). a worst case level j = j (ɛ, C), Geometric decay of sup j E ˆθ j θ j 2 as j j. Growing Gaussian aspect: As noise ɛ = σ/ n) decreases, worst level j = j (ɛ, C) increases: d j =2 j = c(c/ɛ) 2/3 p.44
52 Final Result sup R( ˆf thr,f) C 2(1 r) ɛ 2r r =2/3 TV(C) sup R( ˆf JS,f) C 2(1 rl) ɛ 2r L r L =1/2 TV(C) 20 Blocks signal 25 Noisy version \sqrt 2 log d threshold 25 Block JS shrinkage p.45
53 Agenda Classical Parametric Ideas Nonparametric Estimation and Growing Gaussian Models I. Kernel Estimation and James-Stein Shrinkage II. Thresholding and Sparsity III. Bernstein-von Mises phenomenon p.46
54 Bernstein-von Mises Phenomenon Asymptotic match of freq. & Bayesian confidence intervals: Classical version Y 1,...,Y n i.i.d. p θ (y)dµ θ Θ R d d fixed Under mild conditions, as n, P θ Y N (ˆθMLE, n 1 I 1 ) θ 0 P n,θ0 0, where I θ = E θ [ θ log p θ ][ θ log p θ ] T is Fisher information, P Q =max A P (A) Q(A) is total variation distance p.47
55 Non-parametric Regression Y i = f(i/n)+σ 0 w i, i =1,...,n For typical smoothness priors (e.g. d th integrated Wiener process prior) Bernstein - von Mises fails [Dennis Cox, D. A. Freedman] frequentist & posterior laws of ˆf f 2 2 center and scale mismatch in Here, revisit via elementary Growing Gaussian Model approach Apply to single levels of wavelet decomposition p.48
56 Growing Gaussian Model Data: Y 1,...,Y n θ i.i.d. N p (θ, σ 2 0 I) Prior: θ N p(0,τ 2 n). growing: p = p(n) with n σ 2 n = Var(Ȳn θ) =σ 2 0 /n; τ 2 n may depend on n. Goal: compare L(ˆθ MLE θ), L(ˆθ Bayes θ) with L(θ Y ). Posterior: All Gaussian: centering and scaling are key: L ( θ Ȳ =ȳ) N p ( ˆθB = w n ȳ, w n σ 2 ni ) w n = τ 2 n σ 2 n + τ 2 n p.49
57 Growing Gaussians and BvM Correspondences: P θ Y N p (w n ȳ, w n σni 2 ) N (ˆθMLE,n 1 I 1 ) θ 0 N p ( ȳ, σni 2 ) For BvM to hold, now need w n 1 sufficiently fast: Proposition: In the growing Gaussian model if and only if Pθ Y N (ˆθMLE,n 1 I 1 θ 0 ) P n,θ0 0, p σ 2 n τ 2 n = p n σ 2 0 τ 2 n 0 ( 1 i.e. w n =1 o ). pn p.50
58 Example: Pinsker priors Back to regression: dy t = f(t)dt + σ n dw t Minimax MSE linear estimation of f: {f : 1 0 (D α f) 2 C 2 } Least favourable prior on wavelet coeffs (for sample size n): θ jk indep N(0,τ 2 j ) τj 2 = σn(λ 2 n 2 jα 1) + λ n = c(c/σ n ) 2α/(2α+1) critical level j = j (n, α) grows with n (Growing Gaussian model again) p.51
59 Validity of B-vM depends on level Bayes estimator for the Pinsker prior attains exact minimax MSE (asymptotically) But Bernstein von Mises fails at the critical level j (n): At j (n) τ 2 j /σ 2 n 2α 1 [fine scale features] 1 w n = 1 1+τ 2 j /σ2 n 2 α BvM fails At fixed j 0 : p =2 j 0 FIXED (or slowly ) [coarse scale] τj 2 0 /σn 2 = λ n 2 j0α 1 w n 0 BvM holds Difficulty lies with high dimensional features. p.52
60 Three Talks 1. Function Estimation & Classical Normal Theory X n N p(n) (θ n,i) p(n) with n (MVN) 2. The Threshold Selection Problem In (MVN) with, say, ˆθ i = X i I{ X i > ˆt} How to select ˆt = ˆt(X) reliably? 3. LargeCovarianceMatrices X n N p(n) (I Σ p(n) ); especially X n = [ Y n Z n ] spectral properties of n 1 X n X T n PCA, CCA, MANOVA p.53
Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan
Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationNonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII
Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric
More informationThe Threshold Selection Problem
. p.1 The Threshold Selection Problem Given data X i N(µ i, 1), i =1,...,n some threshold rule, e.g. { X i X i t ˆθ i = 0 X i
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationarxiv: v4 [stat.me] 27 Nov 2017
CLASSIFICATION OF LOCAL FIELD POTENTIALS USING GAUSSIAN SEQUENCE MODEL Taposh Banerjee John Choi Bijan Pesaran Demba Ba and Vahid Tarokh School of Engineering and Applied Sciences, Harvard University Center
More informationAdaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1999 Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach T. Tony Cai University of Pennsylvania
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationBayesian Nonparametric Point Estimation Under a Conjugate Prior
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda
More informationOn the Estimation of the Function and Its Derivatives in Nonparametric Regression: A Bayesian Testimation Approach
Sankhyā : The Indian Journal of Statistics 2011, Volume 73-A, Part 2, pp. 231-244 2011, Indian Statistical Institute On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).
More informationDISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University
Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der
More informationMinimax Risk: Pinsker Bound
Minimax Risk: Pinsker Bound Michael Nussbaum Cornell University From: Encyclopedia of Statistical Sciences, Update Volume (S. Kotz, Ed.), 1999. Wiley, New York. Abstract We give an account of the Pinsker
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationSemiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationAsymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression
Asymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression T. Tony Cai 1 and Harrison H. Zhou 2 University of Pennsylvania and Yale University Abstract Asymptotic equivalence theory
More informationGaussian estimation: Sequence and multiresolution models
Gaussian estimation: Sequence and multiresolution models Draft version, March 8, 2011 Iain M. Johnstone c2011. Iain M. Johnstone iii Contents (working) Preface viii 1 Introduction 1 1.1 A comparative
More informationWeak convergence of Markov chain Monte Carlo II
Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationAn Introduction to Wavelets and some Applications
An Introduction to Wavelets and some Applications Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France An Introduction to Wavelets and some Applications p.1/54
More informationWavelet Shrinkage for Nonequispaced Samples
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Wavelet Shrinkage for Nonequispaced Samples T. Tony Cai University of Pennsylvania Lawrence D. Brown University
More informationA Data-Driven Block Thresholding Approach To Wavelet Estimation
A Data-Driven Block Thresholding Approach To Wavelet Estimation T. Tony Cai 1 and Harrison H. Zhou University of Pennsylvania and Yale University Abstract A data-driven block thresholding procedure for
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More information3.0.1 Multivariate version and tensor product of experiments
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationLikelihood inference in the presence of nuisance parameters
Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto www.utstat.utoronto.ca/reid/research 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference
More informationNonparametric Inference in Cosmology and Astrophysics: Biases and Variants
Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Collaborators:
More informationComments on \Wavelets in Statistics: A Review" by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill
Comments on \Wavelets in Statistics: A Review" by A. Antoniadis Jianqing Fan University of North Carolina, Chapel Hill and University of California, Los Angeles I would like to congratulate Professor Antoniadis
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationGaussian estimation: Sequence and multiresolution models
Gaussian estimation: Sequence and multiresolution models Draft version, August 29, 2011 Iain M. Johnstone c2011. Iain M. Johnstone iii Contents (working) Preface ix 1 Introduction 1 1.1 A comparative
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationNonparametric Regression
Adaptive Variance Function Estimation in Heteroscedastic Nonparametric Regression T. Tony Cai and Lie Wang Abstract We consider a wavelet thresholding approach to adaptive variance function estimation
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationLecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]
Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationADAPTIVE CONFIDENCE BALLS 1. BY T. TONY CAI AND MARK G. LOW University of Pennsylvania
The Annals of Statistics 2006, Vol. 34, No. 1, 202 228 DOI: 10.1214/009053606000000146 Institute of Mathematical Statistics, 2006 ADAPTIVE CONFIDENCE BALLS 1 BY T. TONY CAI AND MARK G. LOW University of
More informationA DATA-DRIVEN BLOCK THRESHOLDING APPROACH TO WAVELET ESTIMATION
The Annals of Statistics 2009, Vol. 37, No. 2, 569 595 DOI: 10.1214/07-AOS538 Institute of Mathematical Statistics, 2009 A DATA-DRIVEN BLOCK THRESHOLDING APPROACH TO WAVELET ESTIMATION BY T. TONY CAI 1
More informationDefault priors and model parametrization
1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)
More informationShrinkage and Denoising by Minimum Message Length
1 Shrinkage and Denoising by Minimum Message Length Daniel F. Schmidt and Enes Makalic Abstract This paper examines orthonormal regression and wavelet denoising within the Minimum Message Length (MML framework.
More informationBayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4
Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection
More informationIntroduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be
Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show
More informationResearch Statement. Harrison H. Zhou Cornell University
Research Statement Harrison H. Zhou Cornell University My research interests and contributions are in the areas of model selection, asymptotic decision theory, nonparametric function estimation and machine
More informationSparsity and the truncated l 2 -norm
Lee H. Dicker Department of Statistics and Biostatistics utgers University ldicker@stat.rutgers.edu Abstract Sparsity is a fundamental topic in highdimensional data analysis. Perhaps the most common measures
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationStatistics Ph.D. Qualifying Exam
Department of Statistics Carnegie Mellon University May 7 2008 Statistics Ph.D. Qualifying Exam You are not expected to solve all five problems. Complete solutions to few problems will be preferred to
More informationAn Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems
An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems Justin Romberg Georgia Tech, School of ECE ENS Winter School January 9, 2012 Lyon, France Applied and Computational
More informationCarl N. Morris. University of Texas
EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province
More informationChi-square lower bounds
IMS Collections Borrowing Strength: Theory Powering Applications A Festschrift for Lawrence D. Brown Vol. 6 (2010) 22 31 c Institute of Mathematical Statistics, 2010 DOI: 10.1214/10-IMSCOLL602 Chi-square
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationThe Root-Unroot Algorithm for Density Estimation as Implemented. via Wavelet Block Thresholding
The Root-Unroot Algorithm for Density Estimation as Implemented via Wavelet Block Thresholding Lawrence Brown, Tony Cai, Ren Zhang, Linda Zhao and Harrison Zhou Abstract We propose and implement a density
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationSparse linear models
Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time
More informationOPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE
17th European Signal Processing Conference (EUSIPCO 009) Glasgow, Scotland, August 4-8, 009 OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE Abdourrahmane M. Atto 1, Dominique Pastor, Gregoire Mercier
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationA Lower Bound Theorem. Lin Hu.
American J. of Mathematics and Sciences Vol. 3, No -1,(January 014) Copyright Mind Reader Publications ISSN No: 50-310 A Lower Bound Theorem Department of Applied Mathematics, Beijing University of Technology,
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationON BLOCK THRESHOLDING IN WAVELET REGRESSION: ADAPTIVITY, BLOCK SIZE, AND THRESHOLD LEVEL
Statistica Sinica 12(2002), 1241-1273 ON BLOCK THRESHOLDING IN WAVELET REGRESSION: ADAPTIVITY, BLOCK SIZE, AND THRESHOLD LEVEL T. Tony Cai University of Pennsylvania Abstract: In this article we investigate
More information19.1 Maximum Likelihood estimator and risk upper bound
ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 19: Denoising sparse vectors - Ris upper bound Lecturer: Yihong Wu Scribe: Ravi Kiran Raman, Apr 1, 016 This lecture
More informationgroup. redshifts. However, the sampling is fairly complete out to about z = 0.2.
Non parametric Inference in Astrophysics Larry Wasserman, Chris Miller, Bob Nichol, Chris Genovese, Woncheol Jang, Andy Connolly, Andrew Moore, Jeff Schneider and the PICA group. 1 We discuss non parametric
More information(a). Bumps (b). Wavelet Coefficients
Incorporating Information on Neighboring Coecients into Wavelet Estimation T. Tony Cai Bernard W. Silverman Department of Statistics Department of Mathematics Purdue University University of Bristol West
More informationBayesian Asymptotics
BS2 Statistical Inference, Lecture 8, Hilary Term 2008 May 7, 2008 The univariate case The multivariate case For large λ we have the approximation I = b a e λg(y) h(y) dy = e λg(y ) h(y ) 2π λg (y ) {
More informationInformation Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n
Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua
More informationLocal Polynomial Modelling and Its Applications
Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE
Statistica Sinica 9 2009, 885-903 ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE Lawrence D. Brown and Linda H. Zhao University of Pennsylvania Abstract: Many multivariate Gaussian models
More informationPenalty Methods for Bivariate Smoothing and Chicago Land Values
Penalty Methods for Bivariate Smoothing and Chicago Land Values Roger Koenker University of Illinois, Urbana-Champaign Ivan Mizera University of Alberta, Edmonton Northwestern University: October 2001
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationGaussian estimation: Sequence and wavelet models
Gaussian estimation: Sequence and wavelet models Draft version, September 8, 2015 Comments welcome Iain M. Johnstone c2015. Iain M. Johnstone iii Contents (working) Preface List of Notation ix xiv 1 Introduction
More information19.1 Problem setup: Sparse linear regression
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationOPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1
The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute
More informationNon linear estimation in anisotropic multiindex denoising II
Non linear estimation in anisotropic multiindex denoising II Gérard Kerkyacharian, Oleg Lepski, Dominique Picard Abstract In dimension one, it has long been observed that the minimax rates of convergences
More informationwith Current Status Data
Estimation and Testing with Current Status Data Jon A. Wellner University of Washington Estimation and Testing p. 1/4 joint work with Moulinath Banerjee, University of Michigan Talk at Université Paul
More information21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)
10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC
More informationASSESSING A VECTOR PARAMETER
SUMMARY ASSESSING A VECTOR PARAMETER By D.A.S. Fraser and N. Reid Department of Statistics, University of Toronto St. George Street, Toronto, Canada M5S 3G3 dfraser@utstat.toronto.edu Some key words. Ancillary;
More informationLecture Notes 5: Multiresolution Analysis
Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and
More informationHomework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples
Homework #3 36-754, Spring 27 Due 14 May 27 1 Convergence of the empirical CDF, uniform samples In this problem and the next, X i are IID samples on the real line, with cumulative distribution function
More informationRATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1
The Annals of Statistics 1997, Vol. 25, No. 6, 2493 2511 RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 By Theodoros Nicoleris and Yannis
More informationInverse Statistical Learning
Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning
More informationMinimax Estimation of a nonlinear functional on a structured high-dimensional model
Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationEstimation of cumulative distribution function with spline functions
INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationNonparametric Estimation: Part I
Nonparametric Estimation: Part I Regression in Function Space Yanjun Han Department of Electrical Engineering Stanford University yjhan@stanford.edu November 13, 2015 (Pray for Paris) Outline 1 Nonparametric
More information