Statistical Models. ref: chapter 1 of Bates, D and D. Watts (1988) Nonlinear Regression Analysis and its Applications, Wiley. Dave Campbell 2009

Size: px

Start display at page:

Download "Statistical Models. ref: chapter 1 of Bates, D and D. Watts (1988) Nonlinear Regression Analysis and its Applications, Wiley. Dave Campbell 2009"

Paulina Owen
6 years ago
Views:

1 Statistical Models ref: chapter 1 of Bates, D and D. Watts (1988) Nonlinear Regression Analysis and its Applications, Wiley Dave Campbell 2009

2 Today linear regression in terms of the response surface geometry of linear regression

3 Linear Regression The question: Can arm span be used as a predictor for height? We want to use the model: Height = Armspan * ß + error

4 X is a nx1 vector of arm span values Y is a nx1 vector of heights We have n=2 observations using 0 intercept we have 1 parameter ß ß tells us a distance along the X direction vector so that the set {Xß, - <ß< } defines the response surface

5 Linear regression selects the value of ß that maximizes the likelihood for ß. The model Y=xß+z, where z is the error (disturbance) vector The model can be divided into deterministic and stochastic parts

6 Xß is the response surface or Expectation function X is the derivative matrix, it is the derivative of the expectation surface with respect to the parameters

7 ß=(X X) -1 X Y comes from likelihood, from finding a minimum variance unbiased estimator or from a Bayesian method with a specific prior

8 Intervals Sampling theory approach is based on the notion that if the experiment were to be repeated many times and the method is applied, on average the interval estimates would contain the true parameter (1- α )% of the time

9 Intervals Sampling theory approach ˆβ is normally distributed because ˆβ is a linear function of Y, which is a linear function of Z and Z is assumed normally distributed ˆβ is unbiased var( ˆβ) = σ 2 ( X X) 1

10 CIs from sampling theory a 1- α marginal confidence interval for β is ˆβ k ± se( ˆβ k )t N p,α /2 for the k th parameters, t is the distribution with N-p degrees of freedom and cut-off α / 2 se( ˆβ ) = s 2 {( X X) 1 } k kk {(X X) -1 }kk is the k,k entry in that matrix

11 CIs from sampling theory a 1- α joint confidence region for β is the ellipsoid (β ˆβ ) X ' X(β ˆβ) ps 2 F p,n p,α p parameters, F is the distribution with p and N-p degrees of freedom and cut-off s 2 = y x ˆβ 2 N p α

12 CIs from sampling theory a 1- α confidence interval for the expected response at the point xo is x 0 ˆβ ± s x0 '(X ' X) 1 x 0 t N p,α /2 This is an interval estimate for the response at the point xo This is essentially the same as the Marginal interval but is built from a CI for xoß

13 CIs from sampling theory a 1- α confidence band for the expected response at any point x is x ˆβ ± s x'(x ' X) 1 x pf p,n p,α This is an interval estimate for the response function simultaneously at all points

14 Likelihood intervals Likelihood intervals are interpreted as the region above which the likelihood is more than a specified fraction of the maximum

15 Likelihood inference The likelihood depends on ß only through y-xß so likelihood contours are of the form y-xß 2 =c for some constant c Intervals based on the likelihood depend on its rate of change and curvature

16 Likelihood inference y-xß 2 =c for some constant c A likelihood region is bounded by the contour c = y x ˆβ p N p F p,n p,α In the linear case this is the same as the joint confidence region from sampling theory

17 Bayesian Statistics The basic idea is that you want to combine sources of information-compile evidence Bayesians use a prior P( β, σ) to describe their beliefs about the parameters prior to obtaining observational information What prior information did we have?

18 We often have strong information but to appease other peoples opinions, Bayesians sometimes flatten out their priors. For a convenient reason let s say ß>0 and P(sigma) propto 1/sigma is our prior information. Clearly this is a dumb prior since we have a lot of information about the parameters.

19 Bayesians update their prior information with observational information; P(β,σ Y = y) = P(Y = y β,σ )P(β,σ ) / P(Y ) P(β,σ Y = y) P(Y = y β,σ )P(β,σ ) The posterior distribution is the belief about the value of ß conditional on the observed data Y=y

20 In our example the likelihood and prior are from Y β,σ ~ N(Xβ,σ 2 ) P(β,σ ) 1 / σ The posterior: P(β,σ Y = y) P(Y = y β,σ )P(β,σ ) is the belief about the value of ß updated by the observation of data

21 Point Estimates Bayesians can choose from several point estimates because the goal is to get the entire posterior distribution for the parameters Let s find the values

22 Bayesian Intervals Bayesian intervals say that (1- )% of the belief about the true parameter α value is within the interval We generally use Bayesian Highest Posterior Density (HPD) regions where the HPD region R is P(β R y) = 1 α where For all P(β 1 y) > P(β 2 y) β 1 R and β 2 / R

23 Matlab [B,BINT,R,RINT,STATS] = REGRESS(Y,X) estimate B interval BINT residuals R

24 [B,BINT,R,RINT,STATS] = REGRESS(Y,X) matrix RINT of intervals that can be used to diagnose outliers --> If RINT(i,:) does not contain zero, then the i-th residual is larger than would be expected, at the 5% significance level. Rint is the residual ± interval width from expected response

25 [B,BINT,R,RINT,STATS] = REGRESS(Y,X) STATS is a vector containing 1. R-square statistic, 2.the F statistic 3. p value for the full model 4. and an estimate of the error variance.

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x