Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;
|
|
- Linda Sims
- 5 years ago
- Views:
Transcription
1 BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong Marshall School, U. S. C. Preprint available on ArXiv. NOTE: In several places the symbol ν should read ν 2. This correction should be clear from the context. 1
2 Multiple Independent Gaussian Problems n independent problems, indexed by i = 1,..,n Observe X i N( 2 θ i,v ) Pi [θ i unknown; v P, i known] [In applications, X i might be X i = X ii with X ik iid N( 2 θ i,σ i ), k = 1,..,K i, v 2 P,i =σ 2 i K i.] 2
3 Quantile Prediction For each i potential future observation Y i N( θ i,ν F,i ). [Again this might be average of several iid var s.] Note that for each i, the mean θ i is same for both Past and Future. Fix b i ( 0,1), i = 1,..,n. Goal is to predict the b i -th quantile of dist. of Y i for each index, i. This quantile is q 0,i =θ i + v F,i Φ 1 b i ( ). The naïve coordinate-wise predictor is ˆq 0,i = X i + v F,i Φ 1 b i ( ). NOTE: This is formal Bayes for uniform prior 3
4 Shrinkage Estimation Generally beneficial in multi-mean problems (and many other multivariate settings). But the shrinkage needs to be properly implemented. For homoscedastic problems this is well understood. (Stein estimation) But the current problem is quite heteroskedastic: v P,i, v F,i, b i can all depend on i. For such problems minimax shrinkage may dramatically differ from Empirical Bayes shrinkage; And, well-implemented E-B shrinkage is usually preferable See Efron and Morris (1973, 1975) and Xie, Kou, Brown (2012, 2015) 4
5 Parametric Empirical Bayes We follow the hierarchical conjugate-prior paradigm Efron and Morris (op. cit.), Stein (1962), Lindley (1962), Lindley and Smith (1972). θ 1,..,θ n iid N( η,τ 2 ) η,τ 2 are unknown hyper-parameters to be estimated from the data. Let ˆη, τˆ 2 denote the estimates. For simplicity and notational simplicity: Consider (for the talk) the case where η = 0 is known. (Paper allows unknown η and estimates it along with τ.) 5
6 Quantile Prediction for Known Hyper-parameters If τ is known then posterior distribution of! θ known (& Gaussian) Yields known, Gaussian predictive distribution for each future Y i : Let α i = τ 2 ( τ 2 + v P,i ), then F Yi ( y τ 2,X ) i = N α i X i,v F,i +α i v P,i ( ). So quantile prediction is ˆq ( X i ;τ 2 ) = α i X i + ( v F,i +α i v P,i )Φ 1 ( b ) i. The role of the E-B prior structure is only to motivate this family of quantile predictors. 6
7 Direct E-B quantile prediction Could use data + any plausible estimate of τ 2, the hyper-parameter(s), and plug in. i.e., could use marginal MLE τˆ 2 and get prediction q( X i ; τˆ 2 ). Or could similarly use M of M estimate instead of MLE. These (and other) plausible estimates of τ 2 are different & give different predictors. They re not bad. BUT none of these needs to be the best choice. See Xie, Kou, Brown (2012) about the more familiar estimation of means problem. 7
8 Formulation with Loss Function To investigate optimal choice of τ 2 in q X i ; τ 2 ( ) impose a predictive loss function under which ˆq τ = q( X i ;τ 2 ) is the natural predictor of q when τ 2 is known. Then find best (or, just good ) τ 2 = τ 2 loss. Use q = q( X i ; τ 2 ) for each i. " X ( ) under this Check-Loss (aka, pinball loss or quantile loss). Let b,h > 0; normalize by b+ h = 1 (w.l.o.g). i-th component of predictive loss is ( ) = b ( i q Y ) + i + h ( i Y i q) + l i Y i,q 8
9 Note that b i,h i are known and allowed to depend on i. b = 0.7 Plot of Check Loss Y q axis 9
10 This loss has been introduced here as a device to evaluate potential Quantilepredictors. But it has motivation in various applications. For example: 10
11 11
12 Improved E-B quantile prediction Conceptual Idea Define average predictive risk as! ( ) = n 1 E θi l i Y i, q ( i X ) R! θ,! q ( ( )). When using a particular hyper-parameter estimator,!τ 2 =!τ 2 " X ( ), this is ( ) = n 1 E θi l i Y i, q (! i X i ; "τ 2 ( X )) R! θ ; "τ 2 ( ( )). Goal is to create a good/best estimator!τ. 12
13 KEY is to find a good and (nearly) unbiased estimator of ( ( ( ))) [Note what happened to τ 2.] n 1 E θi l i Y i, q i X i ;τ 2 Call it RE! ( " X;τ 2 ). [ RE! is a fnct. of X!, but not of Y!.] This will have (1) E θi RE! " X;τ 2 Then choose " 2 τ RE X ( ( )) n 1 E θi l i Y i, q ( i X i ;τ 2 ) ( ( )). { ( )}. ( ) = argmin τ 2 RE # " X;τ 2 13
14 Why does this yield the best τ 2 (as n )? Because there is a best τ 2 (as n ). Oracle Predictor For every! θ let!! τˆ 2 2 OR = τˆ ( OR X; θ ) = argmin τ 2 R( θ;τ 2 ). Then show [as n for every (L1 b nded) seq! θ ] " " 2 (2) τ ( RE X ) τˆ 2 ( X; " θ ) in prob, & OR (3) R (! θ ; 2 τ RE ) R (! 2 θ ; τˆ OR ). This conceptual scheme involves two major steps: 14
15 Two Major Steps Step 1: Create the (asymptotic) risk estimator RE! τ 2 to yield a suitable approximation (1). ( ) Step 2: Prove it has the desired convergence and risk properties (2) and (3). The two steps are interrelated: ( ) needs to produce a satisfactory approximation in (1), and be computationally feasible. RE! τ 2 AND it also needs to enable Step 2. 15
16 Step 1: Asymptotic Risk Estimator Need to satisfy (1) E θi RE! " X;τ 2 ( ( )) n 1 E θi l i Y i, q ( i X i ;τ 2 ) where l i is check-loss. ( ( )), If l i were squared error then can use SURE. But l i is not differentiable. So something like SURE seems a b i g s t r e t c h! 16
17 HOWEVER, l * ( i θ i,q) " E θi Y i ( l ( i Y i,q)) = v F,i G q θ i,b i v F,i ( )+ wφ( w) bw. G( w, b) =ϕ w ( ) 0 plays the role of a conventional loss function. G is a smooth, C, function. And l i * θ i,q Here s a picture of G for b = 0.7: 17
18 Approxima)on theory based non-linear approxima)on Of the loss Linear Tail approxima)on b = 0.7 b = 0.7 Linear Tail approxima)on 0 Plot of the function G 18
19 ! Creation of the Asymptotic Risk Estimator ARE For v P,i = 1 (for simplicity): (a) Approximate G w,b i K(i) terms. (b) Substitute q ( i X i ;τ ) θ i = w. ( ) by Taylor expansion ( TE ) to (c) The resulting expectation is, say, E θi ( TE( X i,θ ) i ). It includes powers of θ i times functions of X i. (d) Integrate by parts to create an equivalent expectation that contains only functions of X i & no terms with θ i. (e) Step (d) could be understood as a version of SURE generalized to higher powers of θ i. 19
20 (f) Actual computation is greatly facilitated by use of Hermite polynomials. (g) The resulting expectand is a function of X i that is an unbiased estimate of the expected Taylor approximation.! That s the core of our ARE. (h) There are some additional (clever) steps needed to handle large values of W, for which Taylor expansion is not good.! (i) Just to impress you, here s the core of ARE without the additional steps in (h): (In the following U ( i τ ) is a minor truncation/modification of X i and d ( i τ ) is a simple rational function of τ,v P,i,b i. This is copied from the paper, which uses the notation τ in place of τ 2. ) 20
21 (j) Choose K n (i), the number of terms in the Taylor expansion. This varies with i, but is O( logn).! (k) Then add over i to get ARE.! (l) Finally, find the desired argmin τ ARE via a discrete grid search. ( ) 21
22 Step 2: Prove the desired asymptotic properties There are some clues in Xie, Kou and Brown (2012). But parts of the proof here differ in key respects from what s there, and the details here are much more complex. 22
23 Simulation #1 ( ) pairs: Homoscedastic ( σ 2 = 1). Two types of θ i,b i ( 0.58,0.51) with prob 0.9 ( 5.1,0.9) with prob 0.1 Comparison of pred. risks of three pred. methods #coord s n=20 n=20 n=50 n=50 Method Efficiency Ave. of τˆ 2 Efficiency % Ave. of τˆ 2 ARE 86% % ML/MM 68% % Oracle 100% Note: Because model is homoscedastic, MaxLik and MethMom are the same 23
24 Simulations #3 This was a set of 6 simulations involving different scenarios, all with heteroscedastic data and random choices for parameters and b i. e.g., In the first setup (the simplest) ν 2 P U( 0.1,0.33),ν 2 F = 1, θ U( 0,1), b U( 0.5,0.99), all indep. In these 6 scenarios the efficiency of our ARE predictor relative to the oracle was For n=20: 88% < Eff. < 98% For n=100: 91% < Eff. < 99%. [Note: The last scenario involved uniformly distributed observations, rather than normally distributed ones. But it isn t reported in the manuscript whether the oracle knew and used that fact. I ll need to ask Gourab and Paat. 24
25 Recap Problem involves multivariate normal means Goal is quantile predictions of future observations A shrinkage predictor is produced o Shrinkage is produced by a hierarchical conjugate setup. o Quality of quantile prediction measured through predictive check-loss, which is the natural loss for quantile prediction. Methodology involves creation and use of a complicated but computable Asymptotic Risk Estimate! ( ARE ) involving Hermite polynomials. Method is asymptotically justified. And Appears to work well in simulations for moderate n. 25
26 Concluding Remarks for BFF4 1. Quantile predictor methodology is motivated from a Bayesian hierarchical structure. 2. The form of the predictor then drives the methodology 3. The ARE method is a CONDITIONAL FREQUENTIST notion. 4. BUT: The proposed ARE predictor is NOT BAYESIAN It doesn t result from any prior I can think of. not even asymptotically 5. Well known, but often overlooked Prediction is different from parameter estimation. An estimate of the sample quantile doesn t directly lead to quantile prediction. 26
g-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationEFFICIENT EMPIRICAL BAYES PREDICTION UNDER CHECK LOSS USING ASYMPTOTIC RISK ESTIMATES
Submitted to the Annals of Statistics EFFICIENT EMPIRICAL BAYES PREDICTION UNDER CHECK LOSS USING ASYMPTOTIC RISK ESTIMATES By Gourab Mukherjee, Lawrence D. Brown and Paat Rusmevichientong University of
More informationCarl N. Morris. University of Texas
EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province
More informationL. Brown. Statistics Department, Wharton School University of Pennsylvania
Non-parametric Empirical Bayes and Compound Bayes Estimation of Independent Normal Means Joint work with E. Greenshtein L. Brown Statistics Department, Wharton School University of Pennsylvania lbrown@wharton.upenn.edu
More informationEMPIRICAL BAYES ESTIMATES FOR A 2-WAY CROSS-CLASSIFIED ADDITIVE MODEL. By Lawrence D. Brown University of Pennsylvania
Submitted to the Annals of Statistics EMPIRICAL BAYES ESTIMATES FOR A 2-WAY CROSS-CLASSIFIED ADDITIVE MODEL By Lawrence D. Brown University of Pennsylvania By Gourab Mukherjee University of Southern California
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationGeneralized Cp (GCp) in a Model Lean Framework
Generalized Cp (GCp) in a Model Lean Framework Linda Zhao University of Pennsylvania Dedicated to Lawrence Brown (1940-2018) September 9th, 2018 WHOA 3 Joint work with Larry Brown, Juhui Cai, Arun Kumar
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE
Statistica Sinica 9 2009, 885-903 ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE Lawrence D. Brown and Linda H. Zhao University of Pennsylvania Abstract: Many multivariate Gaussian models
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationEmpirical Bayes Estimates for a 2-Way Cross-Classified Additive Model
Empirical Bayes Estimates for a 2-Way Cross-Classified Additive Model arxiv:605.08466v [stat.me] 26 May 206 Lawrence D. Brown, Gourab Mukherjee 2, and Asaf Weinstein,3 University of Pennsylvania 2 University
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationBayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling
Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationQuantile Regression for Extraordinarily Large Data
Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step
More informationRoot-Unroot Methods for Nonparametric Density Estimation and Poisson Random-Effects Models
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 00 Root-Unroot Methods for Nonparametric Density Estimation and Poisson Random-Effects Models Lwawrence D. Brown University
More informationLecture 5 September 19
IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationA union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling
A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More informationSTAT 830 Bayesian Estimation
STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationEstimation of Operational Risk Capital Charge under Parameter Uncertainty
Estimation of Operational Risk Capital Charge under Parameter Uncertainty Pavel V. Shevchenko Principal Research Scientist, CSIRO Mathematical and Information Sciences, Sydney, Locked Bag 17, North Ryde,
More informationPart 4: Multi-parameter and normal models
Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationA new Hierarchical Bayes approach to ensemble-variational data assimilation
A new Hierarchical Bayes approach to ensemble-variational data assimilation Michael Tsyrulnikov and Alexander Rakitko HydroMetCenter of Russia College Park, 20 Oct 2014 Michael Tsyrulnikov and Alexander
More informationBayesian Nonparametric Point Estimation Under a Conjugate Prior
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda
More informationCombining Biased and Unbiased Estimators in High Dimensions. (joint work with Ed Green, Rutgers University)
Combining Biased and Unbiased Estimators in High Dimensions Bill Strawderman Rutgers University (joint work with Ed Green, Rutgers University) OUTLINE: I. Introduction II. Some remarks on Shrinkage Estimators
More informationHAPPY BIRTHDAY CHARLES
HAPPY BIRTHDAY CHARLES MY TALK IS TITLED: Charles Stein's Research Involving Fixed Sample Optimality, Apart from Multivariate Normal Minimax Shrinkage [aka: Everything Else that Charles Wrote] Lawrence
More informationLecture 7 October 13
STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationLecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]
Stats 300C: Theory of Statistics Spring 2018 Lecture 20 May 18, 2018 Prof. Emmanuel Candes Scribe: Will Fithian and E. Candes 1 Outline 1. Stein s Phenomenon 2. Empirical Bayes Interpretation of James-Stein
More informationMachine learning, shrinkage estimation, and economic theory
Machine learning, shrinkage estimation, and economic theory Maximilian Kasy December 14, 2018 1 / 43 Introduction Recent years saw a boom of machine learning methods. Impressive advances in domains such
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationAsymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationBeyond Mean Regression
Beyond Mean Regression Thomas Kneib Lehrstuhl für Statistik Georg-August-Universität Göttingen 8.3.2013 Innsbruck Introduction Introduction One of the top ten reasons to become statistician (according
More informationInformation Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n
Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More information3.0.1 Multivariate version and tensor product of experiments
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]
More informationTopics in empirical Bayesian analysis
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2016 Topics in empirical Bayesian analysis Robert Christian Foster Iowa State University Follow this and additional
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationConfidence Distribution
Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept
More informationEnsemble Minimax Estimation for Multivariate Normal Means
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 011 nsemble Minimax stimation for Multivariate Normal Means Lawrence D. Brown University of Pennsylvania Hui Nie University
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationBayesian Econometrics
Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence
More informationApproximate Bayesian computation for spatial extremes via open-faced sandwich adjustment
Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Ben Shaby SAMSI August 3, 2010 Ben Shaby (SAMSI) OFS adjustment August 3, 2010 1 / 29 Outline 1 Introduction 2 Spatial
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationControlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method
Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationBias Variance Trade-off
Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationExperience Rating in General Insurance by Credibility Estimation
Experience Rating in General Insurance by Credibility Estimation Xian Zhou Department of Applied Finance and Actuarial Studies Macquarie University, Sydney, Australia Abstract This work presents a new
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationThe Threshold Selection Problem
. p.1 The Threshold Selection Problem Given data X i N(µ i, 1), i =1,...,n some threshold rule, e.g. { X i X i t ˆθ i = 0 X i
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationPrevious lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.
Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationBayesian spatial quantile regression
Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse
More informationPrior Choice, Summarizing the Posterior
Prior Choice, Summarizing the Posterior Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Informative Priors Binomial Model: y π Bin(n, π) π is the success probability. Need prior p(π) Bayes
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationQuantile Regression for Panel/Longitudinal Data
Quantile Regression for Panel/Longitudinal Data Roger Koenker University of Illinois, Urbana-Champaign University of Minho 12-14 June 2017 y it 0 5 10 15 20 25 i = 1 i = 2 i = 3 0 2 4 6 8 Roger Koenker
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview
Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationAccouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF
Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationBayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang
Bayesian Dropout Tue Herlau, Morten Morup and Mikkel N. Schmidt Discussed by: Yizhe Zhang Feb 20, 2016 Outline 1 Introduction 2 Model 3 Inference 4 Experiments Dropout Training stage: A unit is present
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationWhy Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful.
Why Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful. Even if you aren t Bayesian, you can define an uninformative prior and everything
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric
More informationEstimation of large dimensional sparse covariance matrices
Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)
More informationWhy experimenters should not randomize, and what they should do instead
Why experimenters should not randomize, and what they should do instead Maximilian Kasy Department of Economics, Harvard University Maximilian Kasy (Harvard) Experimental design 1 / 42 project STAR Introduction
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationCS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III
CS221 / Autumn 2017 / Liang & Ermon Lecture 15: Bayesian networks III cs221.stanford.edu/q Question Which is computationally more expensive for Bayesian networks? probabilistic inference given the parameters
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationDiscussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance
Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance by Casarin, Grassi, Ravazzolo, Herman K. van Dijk Dimitris Korobilis University of Essex,
More informationFREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE
FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia
More information