Shrinkage Estimation in High Dimensions

Size: px
Start display at page:

Download "Shrinkage Estimation in High Dimensions"

Transcription

1 Shrinkage Estimation in High Dimensions Pavan Srinath and Ramji Venkataramanan University of Cambridge ITA 206 / 20

2 The Estimation Problem θ R n is a vector of parameters, to be estimated from an observation y: y = θ + w i.e., y y n θ θ n w. =. +. w n, w i i.i.d. N(0, ) Loss function of estimator ˆθ(y) is θ ˆθ(y) 2 The normalized risk of the estimator is R(θ, ˆθ) = n E [ ˆθ(y) θ 2] The expectation is calculated with the density p θ (y) = (2π) n 2 e y θ / 20

3 Maximum Likelihood Estimator y y n θ θ n w y =. =. +. w n, w i i.i.d. N(0, ) The obvious estimator ˆθ(y) = y is also the ML estimator. But... R(θ, ˆθ ML ) = θ R n 3 / 20

4 Maximum Likelihood Estimator y y n θ θ n w y =. =. +. w n, w i i.i.d. N(0, ) The obvious estimator ˆθ(y) = y is also the ML estimator. But... R(θ, ˆθ ML ) = θ R n For n > 2, there are estimators that do strictly better for all θ (James-Stein 6) 3 / 20

5 James-Stein Estimator ˆθ JS = [ ] (n 2) y 2 y ˆθ JS shrinks each y i towards 0. Its risk is ) [ ] (n 2)2 R (θ, ˆθ JS = E n y 2 <.4.2 ML-Estimator Regular JS-Estimator R(θ,ˆθ)/n θ n = 0, θ i = c for i =,..., 5 and θ i = c for i = 6,..., 0 4 / 20

6 Intuition y y n θ θ n w y =. =. +. ˆθ JS = w n [, w i i.i.d. N(0, ) ] (n 2) y 2 y Why should the estimate of θ depend on all the y i s? 5 / 20

7 Intuition y y n θ θ n w y =. =. +. ˆθ JS = w n [, w i i.i.d. N(0, ) ] (n 2) y 2 y Why should the estimate of θ depend on all the y i s? Note that the loss function is n ((θ ˆθ ) (θ n ˆθ ) n ) 2 5 / 20

8 Intuition y y n θ θ n w y =. =. +. ˆθ JS = w n [, w i i.i.d. N(0, ) ] (n 2) y 2 y Empirical Bayes: If we assume θ i i.i.d. N(0, A), then ( ˆθ MMSE = ) y + A If we don t know n 2 +A, estimate it by y 2 [ ] since E n 2 = y 2 +A This gives ˆθ JS, but the surprise is that it beats ML for any θ! 5 / 20

9 The Attracting Vector ˆθ JS = [ ] (n 2) y 2 y ˆθ JS shrinks y towards the all-zeros vector 0 Risk smaller when θ closer to 0, i.e., θ is small.4.2 ML-Estimator Regular JS-Estimator R(θ,ˆθ)/n θ 6 / 20

10 The Attracting Vector In general, can shrink towards any vector, e.g., α [ ] (n 2) ˆθ = α + y α 2 (y α) Risk of ˆθ decreases as θ α gets smaller 7 / 20

11 The Attracting Vector In general, can shrink towards any vector, e.g., α [ ] (n 2) ˆθ = α + y α 2 (y α) Risk of ˆθ decreases as θ α gets smaller Lindley s estimator: Based on assumption that θ i s lie close to their average θ ( ȳ) [ ˆθ L = ȳ + (n 3) y ȳ 2 ] (y ȳ), ȳ = i ˆθ L has been applied to baseball data, disease incidence data... [Efron-Morris 75] Risk reduction of a JS-like estimator over ML is greatest when θ is close to the attracting vector y i n 7 / 20

12 Positive-Part Version [ (n 3) y ȳ 2 ˆθ L = ȳ + (y ȳ), ȳ = + i When shrinkage factor becomes -ve, replace it by 0 ] y i n 8 / 20

13 Positive-Part Version [ (n 3) y ȳ 2 ˆθ L = ȳ + (y ȳ), ȳ = + i When shrinkage factor becomes -ve, replace it by ] n = 0, θ i = c for all i Lindley's Estimator Positive Part ML-Estimator JS-Estimator Positive Part y i n R(θ,ˆθ)/n ˆθ L does best when θ i s are close to their empirical mean θ θ 8 / 20

14 Another example n = 0, θ i = c for i 5, and θ i = c for 6 i 0.4 JS-Estimator Positive Part Lindley's Estimator Positive Part.2 ML-Estimator R(θ,ˆθ)/n θ What would be a good attracting vector for this example? Shrink +ve y i s that are > ȳ towards c, the rest towards c. But we don t know anything about θ! 9 / 20

15 θ i θ i i Components of θ in 2 clusters i Components of θ in cluster In the absence of prior information about θ, how do we choose an attracting vector that is close to θ? Can we use the data to pick a good attracting vector tailored to the underlying θ? 0 / 20

16 θ i θ i i Components of θ in 2 clusters i Components of θ in cluster In the absence of prior information about θ, how do we choose an attracting vector that is close to θ? Can we use the data to pick a good attracting vector tailored to the underlying θ? Idea: ) Design a good estimator ˆθ 2 for θ s whose components are roughly separable into two clusters 2) Then from y, try to infer which is better ˆθ 2 or ˆθ L 0 / 20

17 A Two-Cluster Estimator Define two clusters C := {y i y i > ȳ}, C 2 := {y i y i ȳ}. Shrink y i s in C towards a, the rest towards a 2. / 20

18 A Two-Cluster Estimator Define two clusters C := {y i y i > ȳ}, C 2 := {y i y i ȳ}. Shrink y i s in C towards a, the rest towards a 2. Attracting vector: The estimator is ν 2 = a ˆθ 2 = ν 2 + {y >ȳ}. {yn>ȳ} + a 2 {y ȳ}. {yn ȳ} [ ] n y ν 2 2 (y ν 2 ) + How to choose a and a 2? / 20

19 The Ideal Attractors Attracting vector ν 2 lies in a 2-d subspace spanned by {y >ȳ}. {yn>ȳ} and {y ȳ}. {yn ȳ} Key Idea: Best attractor is the vector in this subspace that is closest to θ Projection of θ onto this subspace: 2 / 20

20 The Ideal Attractors Attracting vector ν 2 lies in a 2-d subspace spanned by {y >ȳ}. {yn>ȳ} and {y ȳ}. {yn ȳ} Key Idea: Best attractor is the vector in this subspace that is closest to θ Projection of θ onto this subspace: ν 2 = a {y >ȳ}. {yn>ȳ} + a2 {y ȳ}. {yn ȳ} a = n i= θ n i {yi >ȳ} n i=, a i= 2 = θ i {yi ȳ} n {y i >ȳ} i=. {y i ȳ} a, a 2 cannot be computed, but can we estimate them from y? 2 / 20

21 Estimating the Attractors a = n i= y i {yi >ȳ} n 2δ i=0 { y i ȳ δ} n i= {y i >ȳ} a 2 = n i= y i {yi ȳ} + n 2δ i=0 { y i ȳ δ} n i= {y i ȳ} δ is a constant that should be chosen small but n Concentration of a, a 2 For any 0 < ɛ < P ( a a κ δ + o(δ) ɛ) Ke nkɛ2 P ( a 2 a 2 κ 2 δ + o(δ) ɛ) K e nk ɛ 2 3 / 20

22 Risk of Two-Cluster Estimator [ ] n ˆθ 2 = ν 2 + y ν 2 2 (y ν 2 ) + Theorem: The loss function of the two-cluster estimator ˆθ 2 satisfies: () For any 0 < ɛ <, P ( n θ ˆθ 2 2 [ min ( β n, ) ) β n + κ n δ + o(δ)] ɛ Ke nkɛ2 α n + where α n, β n are explicit constants that depend on θ 4 / 20

23 Risk of Two-Cluster Estimator [ ] n ˆθ 2 = ν 2 + y ν 2 2 (y ν 2 ) + Theorem: The loss function of the two-cluster estimator ˆθ 2 satisfies: () For any 0 < ɛ <, P ( n θ ˆθ 2 2 [ min ( β n, ) ) β n + κ n δ + o(δ)] ɛ Ke nkɛ2 α n + where α n, β n are explicit constants that depend on θ (2) For a sequence of θ with increasing dimension n, if θ lim sup 2 n n <, we have lim n ( ) n R θ, ˆθ 2 [ ( min β n, ) β n + κ n δ + o(δ)] = 0 α n + 4 / 20

24 Risk of Lindley s Estimator Note that ˆθ L is a one-cluster estimator: Best attractor in d subspace spanned by is θ; ˆθ L approximates it by ȳ Corollary: The loss function of Lindley s estimator satisfies: () For any 0 < ɛ <, ( ) θ ˆθ L 2 P ρ n n ρ n + ɛ Ke nkɛ2, where θ θ 2 ρ n =. n θ (2) If lim sup 2 n n <, we have lim ( ) n n R θ, ˆθ JS ρ n ρ n + = 0. 5 / 20

25 Picking the Better Estimator Depending on θ, either ˆθ 2 or ˆθ L may have lower risk θ i θ i i Expect ˆθ 2 to be better i Expect ˆθ L to be better For θ whose components are roughly separable into two clusters, expect ˆθ 2 to have lower risk than ˆθ L For θ whose components are clustered around one value, expect ˆθ L to have lower risk How to pick the better estimator? 6 / 20

26 Picking the Better Estimator Depending on θ, either ˆθ 2 or ˆθ L may have lower risk θ i θ i i Expect ˆθ 2 to be better i Expect ˆθ L to be better For θ whose components are roughly separable into two clusters, expect ˆθ 2 to have lower risk than ˆθ L For θ whose components are clustered around one value, expect ˆθ L to have lower risk How to pick the better estimator? IDEA: Estimate the loss of each estimator from the data 6 / 20

27 Hybrid Estimator Loss Estimates: ˆL(θ, ˆθ L ) = ( ) g y ȳ 2 /n n ˆL(θ, ˆθ 2 ) = ( n nδ i=0 ) { y i ȳ δ} (a a 2 ) g( y ν 2 2 /n) where g(x) = max{x, } Hybrid Estimator Choose { ˆθ ˆθ hyb = L if ˆL(θ, ˆθ L ) ˆL(θ, ˆθ 2 ), ˆθ 2 otherwise Different from approach in [George 86]: convex combinations of multiple shrinkage estimators with fixed attracting subspaces 7 / 20

28 Performance of Hybrid Estimator Theorem: The loss function of the hybrid JS-estimator satisfies: () For any 0 < ɛ <, ( { P n θ ˆθ hyb 2 min n θ ˆθ L 2, Ke nkɛ2 where K and k are positive constants. } n θ ˆθ 2 2 ɛ) (2) For a sequence of θ with increasing dimension n, if lim sup n θ 2 /n <, we have { } lim n n R(θ, ˆθ hyb ) min n R(θ, ˆθ L ), n R(θ, ˆθ 2 ) = 0 Proof involves showing ˆL(θ, ˆθ L ), ˆL(θ, ˆθ 2 ) concentrate around n θ ˆθ L 2, n θ ˆθ 2 2, respectively. 8 / 20

29 Simulation Results n = 50 R(θ, ˆθ)/n Regular JS estimator Lindley s estimator 0.2 Two cluster JS estimator Hybrid JS estimator ML estimator θ i s arranged in 2 clusters, one centered at τ, the other at τ Each cluster has width τ 2, placement of points within a cluster is uniformly random τ 9 / 20

30 Simulation Results n = 000 R(θ, ˆθ)/n Regular JS estimator Lindley s estimator 0.2 Two cluster JS estimator Hybrid JS estimator ML estimator θ i s arranged in 2 clusters, one centered at τ, the other at τ Each cluster has width τ 2, placement of points within a cluster is uniformly random τ 9 / 20

31 Simulation Results n=000 R(θ, ˆθ)/n Regular JS estimator Lindley s estimator Two cluster JS estimator Hybrid JS estimator ML estimator θ i s are uniformly placed between τ and τ τ 9 / 20

32 Summary [ ] n ˆθ = ν + y ν 2 (y ν) + To achieve significant risk reduction over ML: Shrinkage estimator needs attracting vector ν that s close to θ Can we find a good ν(y), without any knowledge about θ? Proposed estimator infers clustering structure of θ and picks a good target subspace for the attractor Can test for up to L-clusters for any integer L 2, and pick the best one based on loss estimate. Provided concentration results for loss function, convergence results for risk Future Work: Test performance on real data sets, More general multi-dimn. target subspaces than cluster-based ones Cluster-Seeking James-Stein Estimators: 20 / 20

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973] Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Math 104: Homework 7 solutions

Math 104: Homework 7 solutions Math 04: Homework 7 solutions. (a) The derivative of f () = is f () = 2 which is unbounded as 0. Since f () is continuous on [0, ], it is uniformly continous on this interval by Theorem 9.2. Hence for

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Estimation of parametric functions in Downton s bivariate exponential distribution

Estimation of parametric functions in Downton s bivariate exponential distribution Estimation of parametric functions in Downton s bivariate exponential distribution George Iliopoulos Department of Mathematics University of the Aegean 83200 Karlovasi, Samos, Greece e-mail: geh@aegean.gr

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

3.0.1 Multivariate version and tensor product of experiments

3.0.1 Multivariate version and tensor product of experiments ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]

More information

Combining Biased and Unbiased Estimators in High Dimensions. (joint work with Ed Green, Rutgers University)

Combining Biased and Unbiased Estimators in High Dimensions. (joint work with Ed Green, Rutgers University) Combining Biased and Unbiased Estimators in High Dimensions Bill Strawderman Rutgers University (joint work with Ed Green, Rutgers University) OUTLINE: I. Introduction II. Some remarks on Shrinkage Estimators

More information

Lecture 7 October 13

Lecture 7 October 13 STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

10. Composite Hypothesis Testing. ECE 830, Spring 2014

10. Composite Hypothesis Testing. ECE 830, Spring 2014 10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

large number of i.i.d. observations from P. For concreteness, suppose

large number of i.i.d. observations from P. For concreteness, suppose 1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate

More information

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Week 2 Spring Lecture 3. The Canonical normal means estimation problem (cont.).! (X) = X+ 1 X X, + (X) = X+ 1

Week 2 Spring Lecture 3. The Canonical normal means estimation problem (cont.).! (X) = X+ 1 X X, + (X) = X+ 1 Week 2 Spring 2009 Lecture 3. The Canonical normal means estimation problem (cont.). Shrink toward a common mean. Theorem. Let X N ; 2 I n. Let 0 < C 2 (n 3) (hence n 4). De ne!! (X) = X+ 1 Then C 2 X

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction

More information

EM for Spherical Gaussians

EM for Spherical Gaussians EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical

More information

Chapter 14 Stein-Rule Estimation

Chapter 14 Stein-Rule Estimation Chapter 14 Stein-Rule Estimation The ordinary least squares estimation of regression coefficients in linear regression model provides the estimators having minimum variance in the class of linear and unbiased

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall.

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall. .1 Limits of Sequences. CHAPTER.1.0. a) True. If converges, then there is an M > 0 such that M. Choose by Archimedes an N N such that N > M/ε. Then n N implies /n M/n M/N < ε. b) False. = n does not converge,

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

Lecture 14 October 13

Lecture 14 October 13 STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:

More information

Problem Set 6: Solutions Math 201A: Fall a n x n,

Problem Set 6: Solutions Math 201A: Fall a n x n, Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series

More information

Journal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh

Journal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh Journal of Statistical Research 007, Vol. 4, No., pp. 5 Bangladesh ISSN 056-4 X ESTIMATION OF AUTOREGRESSIVE COEFFICIENT IN AN ARMA(, ) MODEL WITH VAGUE INFORMATION ON THE MA COMPONENT M. Ould Haye School

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

An inverse of Sanov s theorem

An inverse of Sanov s theorem An inverse of Sanov s theorem Ayalvadi Ganesh and Neil O Connell BRIMS, Hewlett-Packard Labs, Bristol Abstract Let X k be a sequence of iid random variables taking values in a finite set, and consider

More information

Statistical Analysis in Empirical Bayes and in Causal inference

Statistical Analysis in Empirical Bayes and in Causal inference University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Spring 5-16-2011 Statistical Analysis in Empirical Bayes and in Causal inference Hui Nie University of Pennsylvania, niehui@wharton.upenn.edu

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Ensemble Minimax Estimation for Multivariate Normal Means

Ensemble Minimax Estimation for Multivariate Normal Means University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 011 nsemble Minimax stimation for Multivariate Normal Means Lawrence D. Brown University of Pennsylvania Hui Nie University

More information

Lecture 3 January 28

Lecture 3 January 28 EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only

More information

Infinite Sequences of Real Numbers (AKA ordered lists) DEF. An infinite sequence of real numbers is a function f : N R.

Infinite Sequences of Real Numbers (AKA ordered lists) DEF. An infinite sequence of real numbers is a function f : N R. Infinite Sequences of Real Numbers (AKA ordered lists) DEF. An infinite sequence of real numbers is a function f : N R. Usually (infinite) sequences are written as lists, such as, x n n 1, x n, x 1,x 2,x

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

SDS : Theoretical Statistics

SDS : Theoretical Statistics SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff

More information

Lecture notes on statistical decision theory Econ 2110, fall 2013

Lecture notes on statistical decision theory Econ 2110, fall 2013 Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

A Resampling Method on Pivotal Estimating Functions

A Resampling Method on Pivotal Estimating Functions A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation

More information

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling François Bachoc former PhD advisor: Josselin Garnier former CEA advisor: Jean-Marc Martinez Department

More information

EMPIRICAL BAYES ESTIMATES FOR A 2-WAY CROSS-CLASSIFIED ADDITIVE MODEL. By Lawrence D. Brown University of Pennsylvania

EMPIRICAL BAYES ESTIMATES FOR A 2-WAY CROSS-CLASSIFIED ADDITIVE MODEL. By Lawrence D. Brown University of Pennsylvania Submitted to the Annals of Statistics EMPIRICAL BAYES ESTIMATES FOR A 2-WAY CROSS-CLASSIFIED ADDITIVE MODEL By Lawrence D. Brown University of Pennsylvania By Gourab Mukherjee University of Southern California

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Numerical Sequences and Series

Numerical Sequences and Series Numerical Sequences and Series Written by Men-Gen Tsai email: b89902089@ntu.edu.tw. Prove that the convergence of {s n } implies convergence of { s n }. Is the converse true? Solution: Since {s n } is

More information

DS-GA 1002 Lecture notes 12 Fall Linear regression

DS-GA 1002 Lecture notes 12 Fall Linear regression DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,

More information

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

F (x) = P [X x[. DF1 F is nondecreasing. DF2 F is right-continuous

F (x) = P [X x[. DF1 F is nondecreasing. DF2 F is right-continuous 7: /4/ TOPIC Distribution functions their inverses This section develops properties of probability distribution functions their inverses Two main topics are the so-called probability integral transformation

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

7 Influence Functions

7 Influence Functions 7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the

More information

Convergence of Quantum Statistical Experiments

Convergence of Quantum Statistical Experiments Convergence of Quantum Statistical Experiments M!d!lin Gu"! University of Nottingham Jonas Kahn (Paris-Sud) Richard Gill (Leiden) Anna Jen#ová (Bratislava) Statistical experiments and statistical decision

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

5.2 Expounding on the Admissibility of Shrinkage Estimators

5.2 Expounding on the Admissibility of Shrinkage Estimators STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong

More information

MA 1124 Solutions 14 th May 2012

MA 1124 Solutions 14 th May 2012 MA 1124 Solutions 14 th May 2012 1 (a) Use True/False Tables to prove (i) P = Q Q = P The definition of P = Q is given by P Q P = Q T T T T F F F T T F F T So Q P Q = P F F T T F F F T T T T T Since the

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.

More information

Lecture 5 September 19

Lecture 5 September 19 IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical

More information

Bayesian Analysis of Risk for Data Mining Based on Empirical Likelihood

Bayesian Analysis of Risk for Data Mining Based on Empirical Likelihood 1 / 29 Bayesian Analysis of Risk for Data Mining Based on Empirical Likelihood Yuan Liao Wenxin Jiang Northwestern University Presented at: Department of Statistics and Biostatistics Rutgers University

More information

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Matt Williams National Agricultural Statistics Service United States Department of Agriculture Matt.Williams@nass.usda.gov

More information

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel Lecture Notes 4 Vector Detection and Estimation Vector Detection Reconstruction Problem Detection for Vector AGN Channel Vector Linear Estimation Linear Innovation Sequence Kalman Filter EE 278B: Random

More information

Machine learning, shrinkage estimation, and economic theory

Machine learning, shrinkage estimation, and economic theory Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such

More information

STAT215: Solutions for Homework 2

STAT215: Solutions for Homework 2 STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk

Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk Axel Gandy Department of Mathematics Imperial College London a.gandy@imperial.ac.uk user! 2009, Rennes July 8-10, 2009

More information

Supplementary Material to Clustering in Complex Latent Variable Models with Many Covariates

Supplementary Material to Clustering in Complex Latent Variable Models with Many Covariates Statistica Sinica: Supplement Supplementary Material to Clustering in Complex Latent Variable Models with Many Covariates Ya Su 1, Jill Reedy 2 and Raymond J. Carroll 1,3 1 Department of Statistics, Texas

More information

Random Locations of Periodic Stationary Processes

Random Locations of Periodic Stationary Processes Random Locations of Periodic Stationary Processes Yi Shen Department of Statistics and Actuarial Science University of Waterloo Joint work with Jie Shen and Ruodu Wang May 3, 2016 The Fields Institute

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

Computational and Statistical Tradeoffs via Convex Relaxation

Computational and Statistical Tradeoffs via Convex Relaxation Computational and Statistical Tradeoffs via Convex Relaxation Venkat Chandrasekaran Caltech Joint work with Michael Jordan Time-constrained Inference o Require decision after a fixed (usually small) amount

More information

Answer Key for STAT 200B HW No. 7

Answer Key for STAT 200B HW No. 7 Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;

More information

A D VA N C E D P R O B A B I L - I T Y

A D VA N C E D P R O B A B I L - I T Y A N D R E W T U L L O C H A D VA N C E D P R O B A B I L - I T Y T R I N I T Y C O L L E G E T H E U N I V E R S I T Y O F C A M B R I D G E Contents 1 Conditional Expectation 5 1.1 Discrete Case 6 1.2

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Asymmetric least squares estimation and testing

Asymmetric least squares estimation and testing Asymmetric least squares estimation and testing Whitney Newey and James Powell Princeton University and University of Wisconsin-Madison January 27, 2012 Outline ALS estimators Large sample properties Asymptotic

More information

IEOR E4703: Monte-Carlo Simulation

IEOR E4703: Monte-Carlo Simulation IEOR E4703: Monte-Carlo Simulation Output Analysis for Monte-Carlo Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Output Analysis

More information

Phase Transitions for Dilute Particle Systems with Lennard-Jones Potential

Phase Transitions for Dilute Particle Systems with Lennard-Jones Potential Weierstrass Institute for Applied Analysis and Stochastics Phase Transitions for Dilute Particle Systems with Lennard-Jones Potential Wolfgang König TU Berlin and WIAS joint with Andrea Collevecchio (Venice),

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Graduate Econometrics I: Asymptotic Theory

Graduate Econometrics I: Asymptotic Theory Graduate Econometrics I: Asymptotic Theory Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Asymptotic Theory

More information