Sampling Distributions

Similar documents
Sampling Distributions

Maximum Likelihood Estimation

A Significance Test for the Lasso

Interpretation, Prediction and Confidence Intervals

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015

Data Mining Stat 588

MLES & Multivariate Normal Theory

STAT 462-Computational Data Analysis

AMS-207: Bayesian Statistics

Ch 3: Multiple Linear Regression

A significance test for the lasso

Ch 2: Simple Linear Regression

Introduction to Statistics and R

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

A significance test for the lasso

Post-selection Inference for Forward Stepwise and Least Angle Regression

Probability and Statistics Notes

Distribution Assumptions

The linear model is the most fundamental of all serious statistical models encompassing:

Least Angle Regression, Forward Stagewise and the Lasso

Regularization Paths

BIOS 2083 Linear Models c Abdus S. Wahed

Applied Regression Analysis

TMA4267 Linear Statistical Models V2017 (L10)

Multiple Linear Regression

ST430 Exam 1 with Answers

Chapter 3. Linear Models for Regression

Simple Linear Regression

Lecture 6 Multiple Linear Regression, cont.

Some new ideas for post selection inference and model assessment

General Linear Model: Statistical Inference

COMP 551 Applied Machine Learning Lecture 2: Linear regression

Linear Regression Model. Badr Missaoui

Gauss Markov & Predictive Distributions

High-dimensional data analysis

Simple and Multiple Linear Regression

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

variability of the model, represented by σ 2 and not accounted for by Xβ

Lecture 1: Linear Models and Applications

Regularization Paths. Theme

Inference for Regression

Linear models and their mathematical foundations: Simple linear regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

Stat 5102 Final Exam May 14, 2015

3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Lecture 1 Intro to Spatial and Temporal Data

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Simple Linear Regression

6. Multiple Linear Regression

Lecture 18: Simple Linear Regression

A Significance Test for the Lasso

For GLM y = Xβ + e (1) where X is a N k design matrix and p(e) = N(0, σ 2 I N ), we can estimate the coefficients from the normal equations

Bayesian Linear Models

Handout 4: Simple Linear Regression

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Lecture 15. Hypothesis testing in the linear model

Confidence Intervals, Testing and ANOVA Summary

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Bayesian Linear Models

STA 2201/442 Assignment 2

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Lecture 4 Multiple linear regression

A significance test for the lasso

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Exact Post-selection Inference for Forward Stepwise and Least Angle Regression

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Mallows Cp for Out-of-sample Prediction

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model

Stat 401B Final Exam Fall 2015

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Dealing with Heteroskedasticity

STAT 215 Confidence and Prediction Intervals in Regression

Chapter 3: Multiple Regression. August 14, 2018

Bayes Estimators & Ridge Regression

Post-selection inference with an application to internal inference

MS&E 226: Small Data

Topic 14: Inference in Multiple Regression

UNIVERSITY OF TORONTO Faculty of Arts and Science

Simple Linear Regression

Central Limit Theorem ( 5.3)

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

F9 F10: Autocorrelation

Lecture 4: Newton s method and gradient descent

Stat 704 Data Analysis I Probability Review

If you believe in things that you don t understand then you suffer. Stevie Wonder, Superstition

Lecture 11: Regression Methods I (Linear Regression)

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Simple Linear Regression

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Lecture 11: Regression Methods I (Linear Regression)

Math 423/533: The Main Theoretical Topics

The Statistical Property of Ordinary Least Squares

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Transcription:

Merlise Clyde Duke University September 3, 2015

Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2

Prostate Example > library(lasso2); data(prostate) # n = 97, 9 variables > summary(lm(lpsa ~., data=prostate)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 0.669399 1.296381 0.516 0.60690 lcavol 0.587023 0.087920 6.677 2.11e-09 *** lweight 0.454461 0.170012 2.673 0.00896 ** age -0.019637 0.011173-1.758 0.08229. lbph 0.107054 0.058449 1.832 0.07040. svi 0.766156 0.244309 3.136 0.00233 ** lcp -0.105474 0.091013-1.159 0.24964 gleason 0.045136 0.157464 0.287 0.77506 pgg45 0.004525 0.004421 1.024 0.30885 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.7084 on 88 degrees of freedom Multiple R-squared: 0.6548, Adjusted R-squared: 0.6234 F-statistic: 20.86 STA721 onlinear 8 and Models 88 Sampling DF, Distributions p-value: < 2.2e-16

Summary of Distributions Models: Full Y = Xβ + ɛ Assume X is full rank with the first column of ones 1 n and p additional predictors r(x) = p + 1 ˆβ σ 2 N(β, σ 2 (X T X) 1 ) RSS σ 2 χ 2 n r(x) ˆβ j β j SE( ˆβ j ) t n r(x) where SE( ˆβ) is the square root of the jth diagonal element of ˆσ 2 (X T X) 1 and ˆσ 2 is the unbiased estimate of σ 2

of β If Y N(Xβ, σ 2 I n ) Then ˆβ N(β, σ 2 (X T X) 1 )

Unknown σ 2 ˆβ j β j, σ 2 N(β, σ 2 [(X T X) 1 ] jj ) What happens if we substitute ˆσ 2 = e t e/(n r(x)) in the above? ( ˆβ j β j )/σ [(X T X) 1 ] jj e T e/(σ 2 (n r(x)) D = N(0, 1) χ 2 n r(x) /(n r(x) t(n r(x), 0, 1) Need to show that e T e/σ 2 has a χ 2 distribution and is independent of the numerator!

Central Student t Distribution Definition Let Z N(0, 1) and S χ 2 p with Z and S independent, then W = Z S/p has a (central) Student t distribution with p degrees of freedom See Casella & Berger or DeGroot & Schervish for derivation - nice change of variables and marginalization problem!

Chi-Squared Distribution Definition If Z N(0, 1) then Z 2 χ 2 1 degree of freedom) Density (A Chi-squared distribution with one f (x) = 1 Γ(1/2) (1/2) 1/2 x 1/2 1 e x/2 x > 0 Characteristic Function E[e itz 2 ] = ϕ(t) = (1 2it) 1/2

Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 = = p j=1 j ] E[e itz 2 j ] p (1 2it) 1/2 j=1 = (1 2it) p/2 A Gamma distribution with shape p/2 and rate 1/2, G(p/2, 1/2) f (x) = 1 Γ(p/2) (1/2) p/2 x p/2 1 e x/2 x > 0

Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y Z = U T k Y/σ N(UT k µ, UT k U k) Z N(0, I k ) Z T Z χ 2 k Since U T Y/σ D = Z, YT QY σ 2 χ 2 k

Residual Sum of Squares Example Sum of Squares Error (SSE) Let Y N(µ, σ 2 I n ) with µ C(X). Because µ C(X), I P X is a projection on C(X) e T e σ 2 = (I 2 YT n P X ) Y χ 2 n r(x) σ

Estimated Coefficients and Residuals are Independent If Y N(Xβ, σ 2 I n ) Then Cov(ˆβ, e) = 0 which implies independence Functions of independent random variables are independent (show characteristic functions or densities factor)

Putting it all together ˆβ N(β, σ 2 (X T X) 1 ) ( ˆβ j β j )/σ[(x T X) 1 ] jj N(0, 1) e T e/σ 2 χ 2 n r(x) ˆβ and e are independent ( ˆβ j β j )/σ[(x T X) 1 ] jj e T e/(σ 2 (n r(x))) t(n r(x), 0, 1)

Inference 95% Confidence interval: ˆβ j ± t α/2 SE( ˆβ j ) use qt(a, df) for t a quantile derive from pivotal quantity t = ( ˆβ j β j )/SE( ˆβ j ) where P(t (t α/2, t 1 α/2 )) = 1 α

Prostate Example xtable(confint(prostate.lm)) from library(mass) and library(xtable) 2.5 % 97.5 % (Intercept) -1.91 3.25 lcavol 0.41 0.76 lweight 0.12 0.79 age -0.04 0.00 lbph -0.01 0.22 svi 0.28 1.25 lcp -0.29 0.08 gleason -0.27 0.36 pgg45-0.00 0.01

interpretation For a 1 unit increase in X j, expect Y to increase by ˆβ j ± t α/2 SE( ˆβ j ) for log transforms Y = exp(xβ + ɛ) = exp(x j β j ) exp(ɛ) if X = log(w j ) then look at 2-fold or % increases in W to look at multiplicative increase in median of Y ifcavol increases by 10% then we expect PSA to increase by 1.10 (CI ) = (1.0398%, 1.0751%) or by 3.98 to 7.51 percent For a 10% increase in cancer volume, we are 95% confident that the PSA levels will increase by approximately 4 to 7.5 percent.

Derivation