Generalized Cp (GCp) in a Model Lean Framework
|
|
- Audra Harrison
- 5 years ago
- Views:
Transcription
1 Generalized Cp (GCp) in a Model Lean Framework Linda Zhao University of Pennsylvania Dedicated to Lawrence Brown ( ) September 9th, 2018 WHOA 3 Joint work with Larry Brown, Juhui Cai, Arun Kumar Kuchibhotla, and the Wharton Team Richard Berk, Andreas Buja, Ed George, Weijie Su Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 1 / 36
2 Table of Content Introduction Conventional Linear Model Assumption Lean Framework OLS and Predictive Risk under Model Lean Framework Generalized C p (GC p ) Definition Properties An alternative: boot GCp Distribution of the Difference in GC p s Simulations Summary and Ongoing Research Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 2 / 36
3 Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 3 / 36
4 Conventional Linear Model The conventional linear model assumes: Y = Xβ + ϵ (1) Y N 1 is the response vector X N r are the r predictors β r 1 is the vector of parameters ϵ N (0, σ 2 I N N ) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 4 / 36
5 Linear Model Violation OFTEN, the model assumptions may not hold! Nonlinearity Heteroscedasticity Missing important variables Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 5 / 36
6 Assumption Lean Setup We proceed without many of the restrictions Without assuming a well-specified linear model To include a random design Without homoscedasticity Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 6 / 36
7 Assumption Lean Setup: Well-defined β Assumption Lean Framework: Observe sample (X i, Y i ) with X i IR r, (X i, Y i ) iid F No assumptions about F, other than existence of low order moments A well-defined parameter β: [ ( ) ] 2 β = argmine F Y X b β = [ E b ( XX )] 1 E [XY ]. (2) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 7 / 36
8 Interpretation of the β [ ( ) ] 2 [ ( β = argmine F Y X b = E XX )] 1 E [XY ]. b It is a statistical functional Best linear approximation or Best linear prediction or The linear portion in a semi-parametric model Same meaning as in the linear model when all the usual assumptions are held See Buja et al (2014, 2016) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 8 / 36
9 β p of Sub-Model Let M p be a sub-model where Note: M p = {i 1, i 2,..., i p } {1,..., p}, p r X p contains only (x i1, x i2,..., x ip ) [ ( ) ] 2 [ ( )] 1 β p = argmine F Y X p b = E X p X p E [Xp Y ] b For simplicity the submodel subscript will be dropped if unnecessary β p is defined within M p Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 9 / 36
10 Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 10 / 36
11 Model Lean Framework: OLS Estimate ˆβ Given data X and Y, the usual sample matrix presentation Natural estimate of β is the Least Square Estimate: ˆβ = (X X) 1 X Y. (3) Goal: 1 Properties of the OLS ˆβ 2 Criterion to choose a good submodel 3 Properties of the criterion Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 11 / 36
12 Properties of OLS Asymptotic Sandwich formula n ( ˆβ β) Dist N (0, Σ sand ) (4) where Σ sand = [ E (XX )] [ 1 ( ] 2 [ ( E XX Y X β) E XX )] 1. See White, Halbert (1980 a,b) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 12 / 36
13 The Sandwich Estimator Simple (and rather naïve) plug in yields the sandwich estimator: ˆΣ sand = where ˆρ = Y X ˆβ. [ ] 1 { } [ n 1 X X n 1 (ˆρ 2 1 i X i X i ) n 1 X X] (5) See White, Halbert (1980 a,b) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 13 / 36
14 The Sandwich Estimator Theorem (Kuchibhotla et al. 2018) Under mild assumptions, the sandwich estimator ˆΣ sand is a consistent estimator of Σ sand, i.e., ˆΣ sand P Σsand. Moreover, ˆΣ sand is a semi-parametrically efficient estimator of Σ sand. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 14 / 36
15 Model Lean Framework: Predictive Risk Contemplate a future observation (X, Y) F. For any submodel M p, the predictive risk of the LS is R p E F [ ( Y X p ˆβ p ) 2 ]. (6) We next Propose a good estimator GC p for R p Study the properties of GC p Derive the distribution of GC p difference Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 15 / 36
16 Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 16 / 36
17 Estimation of Predictive Risk: GC p Define the Generalized C p (GC p ) as follows. GC p = n 1 SSE + 2n 1 ˆξ2 (7) where SSE = Y X ˆβ 2 (8a) ( ) 1 ( ˆξ 2 = tr X X X D 2 ) r X n n (8b) and D 2 r is the diagonal matrix with D 2 r,ii = (Y i X i ˆβ) 2. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 17 / 36
18 Properties of GC p Theorem I GC p is a consistent estimator for the predictive risk R, i.e., GC p P R. Remark: The theorems are true under mild assumptions such as existence of moments. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 18 / 36
19 GCp: Derivation [ R E (Y X ˆβ) 2] [ = E (Y X β) 2] [ ( ) ] 2 + E X ( ˆβ β) [ E (Y X β) 2] [ + E X Σ sand X ] Sandwich ) n 1 Y Xβ 2 + n 1 tr (ˆΣ sand X X/n Empirical moment n 1 Y X ˆβ 2 ) ) + n 1 tr (ˆΣ sand X X/n + n 1 tr (ˆΣ sand X X/n ( = n 1 Y X ˆβ 2 + 2n 1 tr X X n = n 1 Y X ˆβ 2 + 2n 1 ˆξ2 GC p. ) 1 ( X D 2 r X n ) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 19 / 36
20 GC p boot : GC p through Bootstrap An alternative estimator formulation is obtained through M-of-N Bootstrap. GC p boot n 1 Y X ˆβ 2 + 2tr(n 1 X XˆΣ boot ) (9) where and ˆβ bt i ˆΣ boot = 1 n boot ( ˆβ bt i n boot i=1 ˆβ)( ˆβ bt i is an M-of-N bootstrap OLS estimator. ˆβ) Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 20 / 36
21 GC p and GC p boot To compare GC p and GC p boot we have Theorem II: GC p is the limit of the M-of-N bootstrap GC p boot as M for a fixed sample of size n, i.e., lim M GC p boot = GC p. Note: GC p and GC p boot are different for fixed n. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 21 / 36
22 Remark: GC p and Mallows C p Mallows version for a sub-model of size p is C p = (SSE p /ˆσ 2 r ) n + 2p (10) C U p, an alternate form of C p GC p and Mallows C p are very different! C U p = n 1 SSE p + 2n 1 pˆσ 2 r (11) Mallows C p is for fixed design and all the related results only hold under strict linear model assumptions. Comparison and examples are presented in our paper. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 22 / 36
23 Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 23 / 36
24 Comparison of Sub-Models For simplicity, let M p M p+q be two nested sub-models where M p = {1,..., p} with β p = (β p 1,..., βp p) M p+q = {1,..., p + q} with β p+q = (β p+q 1,..., β p+q p+q). Goal: Choose a model with min{r p, R p+q }. Question: How good is the decision based on = GC p+q GC p? Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 24 / 36
25 Contiguity setup Question: How good is the decision based on = GC p+q GC p?. Decisions based on works well when the predictive risks of two nested submodels are well-separated, i.e., R p R p+q = O(1). The problem of interest is when the predictive risks of two nested submodels are close, i.e., under the contiguity condition, R p R p+q = O(1/n). Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 25 / 36
26 Distribution of = GC p+q GC p ( ) WLOG assume all the X i s are in their canonical form, i.e., E X i = 0, ) ( ) E (X i X j = 0 and E = 1. Consider two nested models M p M p+q. Theorem III X 2 i Under the contiguous setting, i.e. R p+q R p = O(1/n), consider the two nested models M p M p+q. Also assume the canonical conditions for X, then in distribution. n(gc p+q GC p ) c 1 G Z 2 + c 2 Note: Z follows a multivariate normal distribution and G denotes the CDF of Z 2. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 26 / 36
27 Distribution of = GC p+q GC p As a special case of Theorem III, we have the following Corollary In addition to its canonical form, assume Full model is well-specified and Homoscedasticity, i.e., Var(Y i X i ) = σ 2 = 1 Then and ) ( n (GC p+q GC p χ 2 q n β [p+1,...,p+q] 2) + 2q n(r p+q R p ) = q n β [p+1,...,p+q] 2 Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 27 / 36
28 ( ) P GC p+q GC p < 0 ρ, q ρ = n β [p+1,...,p+q] 2 /q ρ 1 R p+q R p ρ = 0 β [p+1,...,p+q] 2 = 0 M p = M p+q Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 28 / 36
29 P(Choosing the model with smaller R) ρ = n β [p+1,...,p+q] 2 /q ρ 1 R p+q R p ρ = 0 β [p+1,...,p+q] 2 = 0 M p = M p+q Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 29 / 36
30 Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 30 / 36
31 Set-up Sample size n = 100, 1000 X 1 = 1, X i = ( ) 2cos π(i 1)U i = 2,..., m + 1 = r where U i.i.d. ( ) Unif 1, 1, The design is in canonical form ( ) ) E = 1, i = 1,, r and E (X i X j = 0 for i j X 2 i β 2 p = E ( X 2 p σ2( X) ) + β [ p] 2 n+p 1 Y = X β + ϵ where ϵ i.i.d. N ( ) 0, 1 Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 31 / 36
32 Goal For each of the models: M 1, M 2,..., M r R Mi 10,000 j=1 GC p,mi /10, 000 Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 32 / 36
33 Table of Content 1 Introduction 2 OLS and Predictive Risk under Model Lean Framework 3 Generalized C p (GC p ) 4 Distribution of GC p Difference 5 Simulations 6 Summary Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 33 / 36
34 Summary Set up the assumption lean framework to explore the relationship between Y and X. Studied the OLS estimator and the predictive risk R under the assumption lean framework. Proposed the Generalized C p (GC p ) and an alternative GC p boot to estimate the predictive risk. Derived the distribution of GC p difference between nested models. Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 34 / 36
35 Ongoing and Future Research GC p based decision rules are optimal. General formulation of the distribution of GC p difference between non-nested models. GC p for Generalized Linear Model (GLM) Semi-supervised Regression Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 35 / 36
36 References Buja, A., Berk, R., Brown, L., George, E., Pitkin, E., Traskin, M.,... & Zhao, L. (2014). Models as approximations, Part I: A conspiracy of nonlinearity and random regressors in linear regression. arxiv preprint arxiv: Buja, A., Berk, R., Brown, L., George, E., Kuchibhotla, A. K., & Zhao, L. (2016). Models as Approximations Part II: A General Theory of Model-Robust Regression. arxiv preprint arxiv: Kuchibhotla, A. K., Brown, L. D., Buja, A., George, E. I., & Zhao, L. (2018). Valid Post-selection Inference in Assumption-lean Linear Regression. arxiv preprint arxiv: Linda Zhao University of Pennsylvania Generalized Cp (GCp) in a Model Lean Framework 36 / 36
Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference
Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Andreas Buja joint with: Richard Berk, Lawrence Brown, Linda Zhao, Arun Kuchibhotla, Kai Zhang Werner Stützle, Ed George, Mikhail
More informationWhy Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory
Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,
More informationMallows Cp for Out-of-sample Prediction
Mallows Cp for Out-of-sample Prediction Lawrence D. Brown Statistics Department, Wharton School, University of Pennsylvania lbrown@wharton.upenn.edu WHOA-PSI conference, St. Louis, Oct 1, 2016 Joint work
More informationConstruction of PoSI Statistics 1
Construction of PoSI Statistics 1 Andreas Buja and Arun Kumar Kuchibhotla Department of Statistics University of Pennsylvania September 8, 2018 WHOA-PSI 2018 1 Joint work with "Larry s Group" at Wharton,
More informationLawrence D. Brown* and Daniel McCarthy*
Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals
More informationPost-Selection Inference for Models that are Approximations
Post-Selection Inference for Models that are Approximations Andreas Buja joint work with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin, Dan
More informationInference for Approximating Regression Models
University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 1-1-2014 Inference for Approximating Regression Models Emil Pitkin University of Pennsylvania, pitkin@wharton.upenn.edu
More informationDepartment of Criminology
Department of Criminology Working Paper No. 2015-13.01 Calibrated Percentile Double Bootstrap for Robust Linear Regression Inference Daniel McCarthy University of Pennsylvania Kai Zhang University of North
More informationModels as Approximations A Conspiracy of Random Regressors and Model Misspecification Against Classical Inference in Regression
Submitted to Statistical Science Models as Approximations A Conspiracy of Random Regressors and Model Misspecification Against Classical Inference in Regression Andreas Buja,,, Richard Berk, Lawrence Brown,,
More informationEmpirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;
BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong
More informationL. Brown. Statistics Department, Wharton School University of Pennsylvania
Non-parametric Empirical Bayes and Compound Bayes Estimation of Independent Normal Means Joint work with E. Greenshtein L. Brown Statistics Department, Wharton School University of Pennsylvania lbrown@wharton.upenn.edu
More informationAssumption Lean Regression
Assumption Lean Regression Richard Berk, Andreas Buja, Lawrence Brown, Edward George Arun Kumar Kuchibhotla, Weijie Su, and Linda Zhao University of Pennsylvania November 26, 2018 Abstract It is well known
More informationModels as Approximations, Part I: A Conspiracy of Nonlinearity and Random Regressors in Linear Regression
Submitted to Statistical Science Models as Approximations, Part I: A Conspiracy of Nonlinearity and Random Regressors in Linear Regression Andreas Buja,,, Richard Berk, Lawrence Brown,, Edward George,
More informationPoSI and its Geometry
PoSI and its Geometry Andreas Buja joint work with Richard Berk, Lawrence Brown, Kai Zhang, Linda Zhao Department of Statistics, The Wharton School University of Pennsylvania Philadelphia, USA Simon Fraser
More informationModels as Approximations Part II: A General Theory of Model-Robust Regression
Submitted to Statistical Science Models as Approximations Part II: A General Theory of Model-Robust Regression Andreas Buja,,, Richard Berk, Lawrence Brown,, Ed George,, Arun Kumar Kuchibhotla,, and Linda
More informationStat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)
Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1
MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical
More informationThe assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values
Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationVariable Selection Insurance aka Valid Statistical Inference After Model Selection
Variable Selection Insurance aka Valid Statistical Inference After Model Selection L. Brown Wharton School, Univ. of Pennsylvania with Richard Berk, Andreas Buja, Ed George, Michael Traskin, Linda Zhao,
More informationRobust model selection criteria for robust S and LT S estimators
Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often
More informationReliability of inference (1 of 2 lectures)
Reliability of inference (1 of 2 lectures) Ragnar Nymoen University of Oslo 5 March 2013 1 / 19 This lecture (#13 and 14): I The optimality of the OLS estimators and tests depend on the assumptions of
More informationModel Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University
Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationHypothesis Testing For Multilayer Network Data
Hypothesis Testing For Multilayer Network Data Jun Li Dept of Mathematics and Statistics, Boston University Joint work with Eric Kolaczyk Outline Background and Motivation Geometric structure of multilayer
More informationMIT Spring 2015
Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)
More informationHeteroskedasticity. Part VII. Heteroskedasticity
Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationPoSI Valid Post-Selection Inference
PoSI Valid Post-Selection Inference Andreas Buja joint work with Richard Berk, Lawrence Brown, Kai Zhang, Linda Zhao Department of Statistics, The Wharton School University of Pennsylvania Philadelphia,
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationLecture 6: Discrete Choice: Qualitative Response
Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;
More informationEfficient Estimation in Convex Single Index Models 1
1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationRecent Developments in Post-Selection Inference
Recent Developments in Post-Selection Inference Yotam Hechtlinger Department of Statistics yhechtli@andrew.cmu.edu Shashank Singh Department of Statistics Machine Learning Department sss1@andrew.cmu.edu
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationLECTURE 2 LINEAR REGRESSION MODEL AND OLS
SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another
More informationOn the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression
Working Paper 2016:1 Department of Statistics On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression Sebastian
More informationDiagnostics of Linear Regression
Diagnostics of Linear Regression Junhui Qian October 7, 14 The Objectives After estimating a model, we should always perform diagnostics on the model. In particular, we should check whether the assumptions
More informationHeteroskedasticity and Autocorrelation
Lesson 7 Heteroskedasticity and Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 7. Heteroskedasticity
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationCentral Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, / 91. Bruce E.
Forecasting Lecture 3 Structural Breaks Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Structural Breaks October 29-31, 2013 1 / 91 Bruce E. Hansen Organization Detection
More informationQuantile Processes for Semi and Nonparametric Regression
Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationKnockoffs as Post-Selection Inference
Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017 Controlled Variable Selection Conditional modeling setup:
More informationSTAT 4385 Topic 03: Simple Linear Regression
STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationModels as Approximations II: A Model-Free Theory of Parametric Regression
Submitted to Statistical Science Models as Approximations II: A Model-Free Theory of Parametric Regression Andreas Buja,, Lawrence Brown,, Arun Kumar Kuchibhotla, Richard Berk, Ed George,, and Linda Zhao,,
More informationEconometrics of Panel Data
Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationEconometrics - 30C00200
Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationA Practitioner s Guide to Cluster-Robust Inference
A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode
More informationDealing with Heteroskedasticity
Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing
More informationDiscussant: Lawrence D Brown* Statistics Department, Wharton, Univ. of Penn.
Discussion of Estimation and Accuracy After Model Selection By Bradley Efron Discussant: Lawrence D Brown* Statistics Department, Wharton, Univ. of Penn. lbrown@wharton.upenn.edu JSM, Boston, Aug. 4, 2014
More informationInference After Variable Selection
Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationLeast Squares Estimation-Finite-Sample Properties
Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationHolzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure
Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure Sonderforschungsbereich 386, Paper 478 (2006) Online unter: http://epub.ub.uni-muenchen.de/
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationStatistical Inference of Covariate-Adjusted Randomized Experiments
1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationSelective Inference for Effect Modification
Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationPrediction Intervals For Lasso and Relaxed Lasso Using D Variables
Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School 2017 Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Craig J. Bartelsmeyer Southern Illinois University
More informationA Resampling Method on Pivotal Estimating Functions
A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation
More informationTotal Least Squares Approach in Regression Methods
WDS'08 Proceedings of Contributed Papers, Part I, 88 93, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Total Least Squares Approach in Regression Methods M. Pešta Charles University, Faculty of Mathematics
More informationThe outline for Unit 3
The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationNow consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.
Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationIntroduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data
Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation
More informationPolitical Science 236 Hypothesis Testing: Review and Bootstrapping
Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More information3 Multiple Linear Regression
3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationReview of Econometrics
Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,
More informationConfounder Adjustment in Multiple Hypothesis Testing
in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen
More informationTECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study
TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More information1 Mixed effect models and longitudinal data analysis
1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between
More informationEmpirical Economic Research, Part II
Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction
More information1 Appendix A: Matrix Algebra
Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix
More informationDiscrete Dependent Variable Models
Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationCopula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011
Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationMATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests
1/16 MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests Dominique Guillot Departments of Mathematical Sciences University of Delaware February 17, 2016 Statistical
More informationStatistics: A review. Why statistics?
Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval
More informationQuantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be
Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F
More information