Shrinkage Estimation in High Dimensions
|
|
- Marjory Robbins
- 5 years ago
- Views:
Transcription
1 Shrinkage Estimation in High Dimensions Pavan Srinath and Ramji Venkataramanan University of Cambridge ITA 206 / 20
2 The Estimation Problem θ R n is a vector of parameters, to be estimated from an observation y: y = θ + w i.e., y y n θ θ n w. =. +. w n, w i i.i.d. N(0, ) Loss function of estimator ˆθ(y) is θ ˆθ(y) 2 The normalized risk of the estimator is R(θ, ˆθ) = n E [ ˆθ(y) θ 2] The expectation is calculated with the density p θ (y) = (2π) n 2 e y θ / 20
3 Maximum Likelihood Estimator y y n θ θ n w y =. =. +. w n, w i i.i.d. N(0, ) The obvious estimator ˆθ(y) = y is also the ML estimator. But... R(θ, ˆθ ML ) = θ R n 3 / 20
4 Maximum Likelihood Estimator y y n θ θ n w y =. =. +. w n, w i i.i.d. N(0, ) The obvious estimator ˆθ(y) = y is also the ML estimator. But... R(θ, ˆθ ML ) = θ R n For n > 2, there are estimators that do strictly better for all θ (James-Stein 6) 3 / 20
5 James-Stein Estimator ˆθ JS = [ ] (n 2) y 2 y ˆθ JS shrinks each y i towards 0. Its risk is ) [ ] (n 2)2 R (θ, ˆθ JS = E n y 2 <.4.2 ML-Estimator Regular JS-Estimator R(θ,ˆθ)/n θ n = 0, θ i = c for i =,..., 5 and θ i = c for i = 6,..., 0 4 / 20
6 Intuition y y n θ θ n w y =. =. +. ˆθ JS = w n [, w i i.i.d. N(0, ) ] (n 2) y 2 y Why should the estimate of θ depend on all the y i s? 5 / 20
7 Intuition y y n θ θ n w y =. =. +. ˆθ JS = w n [, w i i.i.d. N(0, ) ] (n 2) y 2 y Why should the estimate of θ depend on all the y i s? Note that the loss function is n ((θ ˆθ ) (θ n ˆθ ) n ) 2 5 / 20
8 Intuition y y n θ θ n w y =. =. +. ˆθ JS = w n [, w i i.i.d. N(0, ) ] (n 2) y 2 y Empirical Bayes: If we assume θ i i.i.d. N(0, A), then ( ˆθ MMSE = ) y + A If we don t know n 2 +A, estimate it by y 2 [ ] since E n 2 = y 2 +A This gives ˆθ JS, but the surprise is that it beats ML for any θ! 5 / 20
9 The Attracting Vector ˆθ JS = [ ] (n 2) y 2 y ˆθ JS shrinks y towards the all-zeros vector 0 Risk smaller when θ closer to 0, i.e., θ is small.4.2 ML-Estimator Regular JS-Estimator R(θ,ˆθ)/n θ 6 / 20
10 The Attracting Vector In general, can shrink towards any vector, e.g., α [ ] (n 2) ˆθ = α + y α 2 (y α) Risk of ˆθ decreases as θ α gets smaller 7 / 20
11 The Attracting Vector In general, can shrink towards any vector, e.g., α [ ] (n 2) ˆθ = α + y α 2 (y α) Risk of ˆθ decreases as θ α gets smaller Lindley s estimator: Based on assumption that θ i s lie close to their average θ ( ȳ) [ ˆθ L = ȳ + (n 3) y ȳ 2 ] (y ȳ), ȳ = i ˆθ L has been applied to baseball data, disease incidence data... [Efron-Morris 75] Risk reduction of a JS-like estimator over ML is greatest when θ is close to the attracting vector y i n 7 / 20
12 Positive-Part Version [ (n 3) y ȳ 2 ˆθ L = ȳ + (y ȳ), ȳ = + i When shrinkage factor becomes -ve, replace it by 0 ] y i n 8 / 20
13 Positive-Part Version [ (n 3) y ȳ 2 ˆθ L = ȳ + (y ȳ), ȳ = + i When shrinkage factor becomes -ve, replace it by ] n = 0, θ i = c for all i Lindley's Estimator Positive Part ML-Estimator JS-Estimator Positive Part y i n R(θ,ˆθ)/n ˆθ L does best when θ i s are close to their empirical mean θ θ 8 / 20
14 Another example n = 0, θ i = c for i 5, and θ i = c for 6 i 0.4 JS-Estimator Positive Part Lindley's Estimator Positive Part.2 ML-Estimator R(θ,ˆθ)/n θ What would be a good attracting vector for this example? Shrink +ve y i s that are > ȳ towards c, the rest towards c. But we don t know anything about θ! 9 / 20
15 θ i θ i i Components of θ in 2 clusters i Components of θ in cluster In the absence of prior information about θ, how do we choose an attracting vector that is close to θ? Can we use the data to pick a good attracting vector tailored to the underlying θ? 0 / 20
16 θ i θ i i Components of θ in 2 clusters i Components of θ in cluster In the absence of prior information about θ, how do we choose an attracting vector that is close to θ? Can we use the data to pick a good attracting vector tailored to the underlying θ? Idea: ) Design a good estimator ˆθ 2 for θ s whose components are roughly separable into two clusters 2) Then from y, try to infer which is better ˆθ 2 or ˆθ L 0 / 20
17 A Two-Cluster Estimator Define two clusters C := {y i y i > ȳ}, C 2 := {y i y i ȳ}. Shrink y i s in C towards a, the rest towards a 2. / 20
18 A Two-Cluster Estimator Define two clusters C := {y i y i > ȳ}, C 2 := {y i y i ȳ}. Shrink y i s in C towards a, the rest towards a 2. Attracting vector: The estimator is ν 2 = a ˆθ 2 = ν 2 + {y >ȳ}. {yn>ȳ} + a 2 {y ȳ}. {yn ȳ} [ ] n y ν 2 2 (y ν 2 ) + How to choose a and a 2? / 20
19 The Ideal Attractors Attracting vector ν 2 lies in a 2-d subspace spanned by {y >ȳ}. {yn>ȳ} and {y ȳ}. {yn ȳ} Key Idea: Best attractor is the vector in this subspace that is closest to θ Projection of θ onto this subspace: 2 / 20
20 The Ideal Attractors Attracting vector ν 2 lies in a 2-d subspace spanned by {y >ȳ}. {yn>ȳ} and {y ȳ}. {yn ȳ} Key Idea: Best attractor is the vector in this subspace that is closest to θ Projection of θ onto this subspace: ν 2 = a {y >ȳ}. {yn>ȳ} + a2 {y ȳ}. {yn ȳ} a = n i= θ n i {yi >ȳ} n i=, a i= 2 = θ i {yi ȳ} n {y i >ȳ} i=. {y i ȳ} a, a 2 cannot be computed, but can we estimate them from y? 2 / 20
21 Estimating the Attractors a = n i= y i {yi >ȳ} n 2δ i=0 { y i ȳ δ} n i= {y i >ȳ} a 2 = n i= y i {yi ȳ} + n 2δ i=0 { y i ȳ δ} n i= {y i ȳ} δ is a constant that should be chosen small but n Concentration of a, a 2 For any 0 < ɛ < P ( a a κ δ + o(δ) ɛ) Ke nkɛ2 P ( a 2 a 2 κ 2 δ + o(δ) ɛ) K e nk ɛ 2 3 / 20
22 Risk of Two-Cluster Estimator [ ] n ˆθ 2 = ν 2 + y ν 2 2 (y ν 2 ) + Theorem: The loss function of the two-cluster estimator ˆθ 2 satisfies: () For any 0 < ɛ <, P ( n θ ˆθ 2 2 [ min ( β n, ) ) β n + κ n δ + o(δ)] ɛ Ke nkɛ2 α n + where α n, β n are explicit constants that depend on θ 4 / 20
23 Risk of Two-Cluster Estimator [ ] n ˆθ 2 = ν 2 + y ν 2 2 (y ν 2 ) + Theorem: The loss function of the two-cluster estimator ˆθ 2 satisfies: () For any 0 < ɛ <, P ( n θ ˆθ 2 2 [ min ( β n, ) ) β n + κ n δ + o(δ)] ɛ Ke nkɛ2 α n + where α n, β n are explicit constants that depend on θ (2) For a sequence of θ with increasing dimension n, if θ lim sup 2 n n <, we have lim n ( ) n R θ, ˆθ 2 [ ( min β n, ) β n + κ n δ + o(δ)] = 0 α n + 4 / 20
24 Risk of Lindley s Estimator Note that ˆθ L is a one-cluster estimator: Best attractor in d subspace spanned by is θ; ˆθ L approximates it by ȳ Corollary: The loss function of Lindley s estimator satisfies: () For any 0 < ɛ <, ( ) θ ˆθ L 2 P ρ n n ρ n + ɛ Ke nkɛ2, where θ θ 2 ρ n =. n θ (2) If lim sup 2 n n <, we have lim ( ) n n R θ, ˆθ JS ρ n ρ n + = 0. 5 / 20
25 Picking the Better Estimator Depending on θ, either ˆθ 2 or ˆθ L may have lower risk θ i θ i i Expect ˆθ 2 to be better i Expect ˆθ L to be better For θ whose components are roughly separable into two clusters, expect ˆθ 2 to have lower risk than ˆθ L For θ whose components are clustered around one value, expect ˆθ L to have lower risk How to pick the better estimator? 6 / 20
26 Picking the Better Estimator Depending on θ, either ˆθ 2 or ˆθ L may have lower risk θ i θ i i Expect ˆθ 2 to be better i Expect ˆθ L to be better For θ whose components are roughly separable into two clusters, expect ˆθ 2 to have lower risk than ˆθ L For θ whose components are clustered around one value, expect ˆθ L to have lower risk How to pick the better estimator? IDEA: Estimate the loss of each estimator from the data 6 / 20
27 Hybrid Estimator Loss Estimates: ˆL(θ, ˆθ L ) = ( ) g y ȳ 2 /n n ˆL(θ, ˆθ 2 ) = ( n nδ i=0 ) { y i ȳ δ} (a a 2 ) g( y ν 2 2 /n) where g(x) = max{x, } Hybrid Estimator Choose { ˆθ ˆθ hyb = L if ˆL(θ, ˆθ L ) ˆL(θ, ˆθ 2 ), ˆθ 2 otherwise Different from approach in [George 86]: convex combinations of multiple shrinkage estimators with fixed attracting subspaces 7 / 20
28 Performance of Hybrid Estimator Theorem: The loss function of the hybrid JS-estimator satisfies: () For any 0 < ɛ <, ( { P n θ ˆθ hyb 2 min n θ ˆθ L 2, Ke nkɛ2 where K and k are positive constants. } n θ ˆθ 2 2 ɛ) (2) For a sequence of θ with increasing dimension n, if lim sup n θ 2 /n <, we have { } lim n n R(θ, ˆθ hyb ) min n R(θ, ˆθ L ), n R(θ, ˆθ 2 ) = 0 Proof involves showing ˆL(θ, ˆθ L ), ˆL(θ, ˆθ 2 ) concentrate around n θ ˆθ L 2, n θ ˆθ 2 2, respectively. 8 / 20
29 Simulation Results n = 50 R(θ, ˆθ)/n Regular JS estimator Lindley s estimator 0.2 Two cluster JS estimator Hybrid JS estimator ML estimator θ i s arranged in 2 clusters, one centered at τ, the other at τ Each cluster has width τ 2, placement of points within a cluster is uniformly random τ 9 / 20
30 Simulation Results n = 000 R(θ, ˆθ)/n Regular JS estimator Lindley s estimator 0.2 Two cluster JS estimator Hybrid JS estimator ML estimator θ i s arranged in 2 clusters, one centered at τ, the other at τ Each cluster has width τ 2, placement of points within a cluster is uniformly random τ 9 / 20
31 Simulation Results n=000 R(θ, ˆθ)/n Regular JS estimator Lindley s estimator Two cluster JS estimator Hybrid JS estimator ML estimator θ i s are uniformly placed between τ and τ τ 9 / 20
32 Summary [ ] n ˆθ = ν + y ν 2 (y ν) + To achieve significant risk reduction over ML: Shrinkage estimator needs attracting vector ν that s close to θ Can we find a good ν(y), without any knowledge about θ? Proposed estimator infers clustering structure of θ and picks a good target subspace for the attractor Can test for up to L-clusters for any integer L 2, and pick the best one based on loss estimate. Provided concentration results for loss function, convergence results for risk Future Work: Test performance on real data sets, More general multi-dimn. target subspaces than cluster-based ones Cluster-Seeking James-Stein Estimators: 20 / 20
Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]
Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationMultinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is
Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationMath 104: Homework 7 solutions
Math 04: Homework 7 solutions. (a) The derivative of f () = is f () = 2 which is unbounded as 0. Since f () is continuous on [0, ], it is uniformly continous on this interval by Theorem 9.2. Hence for
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationEstimation of parametric functions in Downton s bivariate exponential distribution
Estimation of parametric functions in Downton s bivariate exponential distribution George Iliopoulos Department of Mathematics University of the Aegean 83200 Karlovasi, Samos, Greece e-mail: geh@aegean.gr
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More information3.0.1 Multivariate version and tensor product of experiments
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]
More informationCombining Biased and Unbiased Estimators in High Dimensions. (joint work with Ed Green, Rutgers University)
Combining Biased and Unbiased Estimators in High Dimensions Bill Strawderman Rutgers University (joint work with Ed Green, Rutgers University) OUTLINE: I. Introduction II. Some remarks on Shrinkage Estimators
More informationLecture 7 October 13
STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More information10. Composite Hypothesis Testing. ECE 830, Spring 2014
10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationlarge number of i.i.d. observations from P. For concreteness, suppose
1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate
More informationModule 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline
Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationChapter 3. Point Estimation. 3.1 Introduction
Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.
More informationWeek 2 Spring Lecture 3. The Canonical normal means estimation problem (cont.).! (X) = X+ 1 X X, + (X) = X+ 1
Week 2 Spring 2009 Lecture 3. The Canonical normal means estimation problem (cont.). Shrink toward a common mean. Theorem. Let X N ; 2 I n. Let 0 < C 2 (n 3) (hence n 4). De ne!! (X) = X+ 1 Then C 2 X
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationECE531 Lecture 8: Non-Random Parameter Estimation
ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction
More informationEM for Spherical Gaussians
EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical
More informationChapter 14 Stein-Rule Estimation
Chapter 14 Stein-Rule Estimation The ordinary least squares estimation of regression coefficients in linear regression model provides the estimators having minimum variance in the class of linear and unbiased
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationDefinition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.
Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed
More informationComposite Hypotheses and Generalized Likelihood Ratio Tests
Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve
More informationBayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling
Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation
More informationCopyright 2010 Pearson Education, Inc. Publishing as Prentice Hall.
.1 Limits of Sequences. CHAPTER.1.0. a) True. If converges, then there is an M > 0 such that M. Choose by Archimedes an N N such that N > M/ε. Then n N implies /n M/n M/N < ε. b) False. = n does not converge,
More informationUniversity of California San Diego and Stanford University and
First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationLecture 14 October 13
STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:
More informationProblem Set 6: Solutions Math 201A: Fall a n x n,
Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series
More informationJournal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh
Journal of Statistical Research 007, Vol. 4, No., pp. 5 Bangladesh ISSN 056-4 X ESTIMATION OF AUTOREGRESSIVE COEFFICIENT IN AN ARMA(, ) MODEL WITH VAGUE INFORMATION ON THE MA COMPONENT M. Ould Haye School
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationAn exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.
Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities
More informationAn inverse of Sanov s theorem
An inverse of Sanov s theorem Ayalvadi Ganesh and Neil O Connell BRIMS, Hewlett-Packard Labs, Bristol Abstract Let X k be a sequence of iid random variables taking values in a finite set, and consider
More informationStatistical Analysis in Empirical Bayes and in Causal inference
University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Spring 5-16-2011 Statistical Analysis in Empirical Bayes and in Causal inference Hui Nie University of Pennsylvania, niehui@wharton.upenn.edu
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationEnsemble Minimax Estimation for Multivariate Normal Means
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 011 nsemble Minimax stimation for Multivariate Normal Means Lawrence D. Brown University of Pennsylvania Hui Nie University
More informationLecture 3 January 28
EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only
More informationInfinite Sequences of Real Numbers (AKA ordered lists) DEF. An infinite sequence of real numbers is a function f : N R.
Infinite Sequences of Real Numbers (AKA ordered lists) DEF. An infinite sequence of real numbers is a function f : N R. Usually (infinite) sequences are written as lists, such as, x n n 1, x n, x 1,x 2,x
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a
More informationSDS : Theoretical Statistics
SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff
More informationLecture notes on statistical decision theory Econ 2110, fall 2013
Lecture notes on statistical decision theory Econ 2110, fall 2013 Maximilian Kasy March 10, 2014 These lecture notes are roughly based on Robert, C. (2007). The Bayesian choice: from decision-theoretic
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationMINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava
MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLECTURE NOTE #3 PROF. ALAN YUILLE
LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.
More informationA Resampling Method on Pivotal Estimating Functions
A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation
More informationKriging models with Gaussian processes - covariance function estimation and impact of spatial sampling
Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling François Bachoc former PhD advisor: Josselin Garnier former CEA advisor: Jean-Marc Martinez Department
More informationEMPIRICAL BAYES ESTIMATES FOR A 2-WAY CROSS-CLASSIFIED ADDITIVE MODEL. By Lawrence D. Brown University of Pennsylvania
Submitted to the Annals of Statistics EMPIRICAL BAYES ESTIMATES FOR A 2-WAY CROSS-CLASSIFIED ADDITIVE MODEL By Lawrence D. Brown University of Pennsylvania By Gourab Mukherjee University of Southern California
More information(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define
Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationNumerical Sequences and Series
Numerical Sequences and Series Written by Men-Gen Tsai email: b89902089@ntu.edu.tw. Prove that the convergence of {s n } implies convergence of { s n }. Is the converse true? Solution: Since {s n } is
More informationDS-GA 1002 Lecture notes 12 Fall Linear regression
DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,
More informationLecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf
Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationF (x) = P [X x[. DF1 F is nondecreasing. DF2 F is right-continuous
7: /4/ TOPIC Distribution functions their inverses This section develops properties of probability distribution functions their inverses Two main topics are the so-called probability integral transformation
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More information7 Influence Functions
7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the
More informationConvergence of Quantum Statistical Experiments
Convergence of Quantum Statistical Experiments M!d!lin Gu"! University of Nottingham Jonas Kahn (Paris-Sud) Richard Gill (Leiden) Anna Jen#ová (Bratislava) Statistical experiments and statistical decision
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationPeter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More information5.2 Expounding on the Admissibility of Shrinkage Estimators
STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationEmpirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;
BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong
More informationMA 1124 Solutions 14 th May 2012
MA 1124 Solutions 14 th May 2012 1 (a) Use True/False Tables to prove (i) P = Q Q = P The definition of P = Q is given by P Q P = Q T T T T F F F T T F F T So Q P Q = P F F T T F F F T T T T T Since the
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.
More informationLecture 5 September 19
IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical
More informationBayesian Analysis of Risk for Data Mining Based on Empirical Likelihood
1 / 29 Bayesian Analysis of Risk for Data Mining Based on Empirical Likelihood Yuan Liao Wenxin Jiang Northwestern University Presented at: Department of Statistics and Biostatistics Rutgers University
More informationSmall Area Modeling of County Estimates for Corn and Soybean Yields in the US
Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Matt Williams National Agricultural Statistics Service United States Department of Agriculture Matt.Williams@nass.usda.gov
More informationLecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel
Lecture Notes 4 Vector Detection and Estimation Vector Detection Reconstruction Problem Detection for Vector AGN Channel Vector Linear Estimation Linear Innovation Sequence Kalman Filter EE 278B: Random
More informationMachine learning, shrinkage estimation, and economic theory
Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such
More informationSTAT215: Solutions for Homework 2
STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationSequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk
Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk Axel Gandy Department of Mathematics Imperial College London a.gandy@imperial.ac.uk user! 2009, Rennes July 8-10, 2009
More informationSupplementary Material to Clustering in Complex Latent Variable Models with Many Covariates
Statistica Sinica: Supplement Supplementary Material to Clustering in Complex Latent Variable Models with Many Covariates Ya Su 1, Jill Reedy 2 and Raymond J. Carroll 1,3 1 Department of Statistics, Texas
More informationRandom Locations of Periodic Stationary Processes
Random Locations of Periodic Stationary Processes Yi Shen Department of Statistics and Actuarial Science University of Waterloo Joint work with Jie Shen and Ruodu Wang May 3, 2016 The Fields Institute
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationL p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by
L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may
More informationComputational and Statistical Tradeoffs via Convex Relaxation
Computational and Statistical Tradeoffs via Convex Relaxation Venkat Chandrasekaran Caltech Joint work with Michael Jordan Time-constrained Inference o Require decision after a fixed (usually small) amount
More informationAnswer Key for STAT 200B HW No. 7
Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;
More informationA D VA N C E D P R O B A B I L - I T Y
A N D R E W T U L L O C H A D VA N C E D P R O B A B I L - I T Y T R I N I T Y C O L L E G E T H E U N I V E R S I T Y O F C A M B R I D G E Contents 1 Conditional Expectation 5 1.1 Discrete Case 6 1.2
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationAsymmetric least squares estimation and testing
Asymmetric least squares estimation and testing Whitney Newey and James Powell Princeton University and University of Wisconsin-Madison January 27, 2012 Outline ALS estimators Large sample properties Asymptotic
More informationIEOR E4703: Monte-Carlo Simulation
IEOR E4703: Monte-Carlo Simulation Output Analysis for Monte-Carlo Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Output Analysis
More informationPhase Transitions for Dilute Particle Systems with Lennard-Jones Potential
Weierstrass Institute for Applied Analysis and Stochastics Phase Transitions for Dilute Particle Systems with Lennard-Jones Potential Wolfgang König TU Berlin and WIAS joint with Andrea Collevecchio (Venice),
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationGraduate Econometrics I: Asymptotic Theory
Graduate Econometrics I: Asymptotic Theory Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Asymptotic Theory
More information