Distribution Assumptions

Size: px

Start display at page:

Download "Distribution Assumptions"

Imogene Shields
5 years ago
Views:

1 Merlise Clyde Duke University November 22, 2016

2 Outline Topics Normality & Transformations Box-Cox Nonlinear Regression Readings: Christensen Chapter 13 & Wakefield Chapter 6

3 Linear Model Linear Model again: Y = µ + ɛ

4 Linear Model Linear Model again: Assumptions: Y = µ + ɛ

5 Linear Model Linear Model again: Assumptions: Y = µ + ɛ µ C(X) µ = Xβ

6 Linear Model Linear Model again: Y = µ + ɛ Assumptions: µ C(X) µ = Xβ ɛ N(0 n, σ 2 I n )

7 Linear Model Linear Model again: Y = µ + ɛ Assumptions: µ C(X) µ = Xβ ɛ N(0 n, σ 2 I n ) Normal Distribution for ɛ with constant variance Outlier Models Robustify with heavy tailed error distributions Computational Advantages of Normal Models

8 Normality Recall e = (I P X )Y 1 independent but not identically distributed

9 Normality Recall e = (I P X )Y = (I P X )(Xˆβ + ɛ) 1 independent but not identically distributed

10 Normality Recall e = (I P X )Y = (I P X )(Xˆβ + ɛ) = (I P X )ɛ 1 independent but not identically distributed

11 Normality Recall e = (I P X )Y = (I P X )(Xˆβ + ɛ) = (I P X )ɛ n e i = ɛ i h ij ɛ j j=1 1 independent but not identically distributed

12 Normality Recall e = (I P X )Y = (I P X )(Xˆβ + ɛ) = (I P X )ɛ e i = ɛ i n h ij ɛ j Lyapunov CLT 1 implies that residuals will be approximately normal (even for modest n), if the errors are not normal j=1 1 independent but not identically distributed

13 Normality Recall e = (I P X )Y = (I P X )(Xˆβ + ɛ) = (I P X )ɛ e i = ɛ i n h ij ɛ j Lyapunov CLT 1 implies that residuals will be approximately normal (even for modest n), if the errors are not normal j=1 Supernormality of residuals 1 independent but not identically distributed

14 Q-Q Plots Order e i : e (1) e (2)... e (n) sample order statistics or sample quantiles

15 Q-Q Plots Order e i : e (1) e (2)... e (n) sample order statistics or sample quantiles Let z (1) z (2)... z (n) denote the expected order statistics of a sample of size n from a standard normal distribution theoretical quantiles

16 Q-Q Plots Order e i : e (1) e (2)... e (n) sample order statistics or sample quantiles Let z (1) z (2)... z (n) denote the expected order statistics of a sample of size n from a standard normal distribution theoretical quantiles If the e i are normal then E[e (i) ] = σz (i)

17 Q-Q Plots Order e i : e (1) e (2)... e (n) sample order statistics or sample quantiles Let z (1) z (2)... z (n) denote the expected order statistics of a sample of size n from a standard normal distribution theoretical quantiles If the e i are normal then E[e (i) ] = σz (i) Expect that points in a scatter plot of e (i) and z (i) should be on a straight line.

18 Q-Q Plots Order e i : e (1) e (2)... e (n) sample order statistics or sample quantiles Let z (1) z (2)... z (n) denote the expected order statistics of a sample of size n from a standard normal distribution theoretical quantiles If the e i are normal then E[e (i) ] = σz (i) Expect that points in a scatter plot of e (i) and z (i) should be on a straight line. Judgment call - use simulations to gain experience!

19 Animal Example Original Units Brain Weight (g) Body Weight (kg)

20 Residual Plots Residuals Residuals vs Fitted Fitted values African elepha Asian elephant Human Standardized residuals Asian elephant Brachiosaurus Normal Q Q African elephant Theoretical Quantiles Standardized residuals Brachiosaurus Scale Location African elephant Asian elephant Standardized residuals African elephant Asian elephant Residuals vs Leverage Cook's distance Brachiosaurus Fitted values Leverage

21 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0

22 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0 { (Y λ U(Y, λ) = Y (λ) 1) = λ λ 0 log(y ) λ = 0

23 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0 { (Y λ U(Y, λ) = Y (λ) 1) = λ λ 0 log(y ) λ = 0 Estimate λ by maximum Likelihood

24 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0 { (Y λ U(Y, λ) = Y (λ) 1) = λ λ 0 log(y ) λ = 0 Estimate λ by maximum Likelihood U(Y, λ) = Y (λ) N(Xβ, σ 2 ) L(λ, β, σ 2 ) f (y i λ, β, σ 2 )

25 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0 { (Y λ U(Y, λ) = Y (λ) 1) = λ λ 0 log(y ) λ = 0 Estimate λ by maximum Likelihood L(λ, β, σ 2 ) f (y i λ, β, σ 2 ) U(Y, λ) = Y (λ) N(Xβ, σ 2 ) Jacobian term is i y λ 1 i for all λ

26 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0 { (Y λ U(Y, λ) = Y (λ) 1) = λ λ 0 log(y ) λ = 0 Estimate λ by maximum Likelihood L(λ, β, σ 2 ) f (y i λ, β, σ 2 ) U(Y, λ) = Y (λ) N(Xβ, σ 2 ) Jacobian term is i y λ 1 i for all λ Profile Likelihood based on substituting MLE β and σ 2 for each value of λ is log(l(λ) (λ 1) i log(y i ) n 2 log(sse(λ))

27 Profile Likelihood 95% log Likelihood λ

28 Residuals After Transformation of Response Residuals vs Fitted Normal Q Q Residuals African elephant Golden hamster Mouse Standardized residuals Mouse Golden hamster African elephant Fitted values Theoretical Quantiles Standardized residuals Mouse African elephant Golden hamster Scale Location Standardized residuals Residuals vs Leverage Golden hamster MouseCook's distance Brachiosaurus Fitted values Leverage

29 Residuals After Transformation of Both Residuals vs Fitted Normal Q Q Residuals Brachiosaurus Triceratops Dipliodocus Standardized residuals Triceratops Brachiosaurus Dipliodocus Fitted values Theoretical Quantiles Standardized residuals Scale Location Brachiosaurus Dipliodocus Triceratops Standardized residuals Residuals vs Leverage 0.5 Triceratops 0.5 Cook's distance Dipliodocus Brachiosaurus Fitted values Leverage

30 Transformed Data Logarithmic Scale Brain Weight (g) 5e 01 5e+00 5e+01 5e+02 5e+03 Triceratops Dipliodocus Brachiosau 1e 01 1e+01 1e+03 1e+05 Body Weight (kg)

31 Test that Dinos are Outliers Res.Df RSS Df Sum of Sq F Pr(>F)

32 Test that Dinos are Outliers Res.Df RSS Df Sum of Sq F Pr(>F) Estimate Std. Error t value Pr(> t ) (Intercept) log(body) Triceratops Brachiosaurus Dipliodocus

33 Test that Dinos are Outliers Res.Df RSS Df Sum of Sq F Pr(>F) Estimate Std. Error t value Pr(> t ) (Intercept) log(body) Triceratops Brachiosaurus Dipliodocus Dinosaurs come from a different population from mammals

34 Model Selection Priors brains.bas = bas.lm(log(brain) ~ log(body) + diag(28), data=animals, prior="hyper-g-n", a=3, modelprior=beta.binomial(1,28), method="mcmc", n.models=2^17, MCMC.it=2^18) # check for convergence plot(brains.bas$probne0, brains.bas$probs.mcmc) brains.bas$probs.mcmc brains.bas$probne0

35 image(brains.bas) Intercept log(body) diag(28)1 diag(28)2 diag(28)3 diag(28)4 diag(28)5 diag(28)6 diag(28)7 diag(28)8 diag(28)9 diag(28)10 diag(28)11 diag(28)12 diag(28)13 diag(28)14 diag(28)15 diag(28)16 diag(28)17 diag(28)18 diag(28)19 diag(28)20 diag(28)21 diag(28)22 diag(28)23 diag(28)24 diag(28)25 diag(28)26 diag(28)27 diag(28)28 Log Posterior Odds Model Rank rownames(animals)[c(6, 14, 16, 26)] "Dipliodocus" "Human" "Triceratops" "Brachiosaurus"

36 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ))

37 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ)) Delta Method implies that g(y ) N(g(µ), g (µ) 2 h(µ)

38 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ)) Delta Method implies that g(y ) N(g(µ), g (µ) 2 h(µ) Find function g such that g (µ) 2 /h(µ) is constant g(y ) N(g(µ), c)

39 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ)) Delta Method implies that g(y ) N(g(µ), g (µ) 2 h(µ) Find function g such that g (µ) 2 /h(µ) is constant g(y ) N(g(µ), c) Poisson Counts (Y > 3): g is square root transformation

40 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ)) Delta Method implies that g(y ) N(g(µ), g (µ) 2 h(µ) Find function g such that g (µ) 2 /h(µ) is constant g(y ) N(g(µ), c) Poisson Counts (Y > 3): g is square root transformation Binomial: arcsin( Y ) Note: transformation for normality may not be the same as the variance stabilizing transformation; boxcox assumes mean function is correct

41 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ)) Delta Method implies that g(y ) N(g(µ), g (µ) 2 h(µ) Find function g such that g (µ) 2 /h(µ) is constant g(y ) N(g(µ), c) Poisson Counts (Y > 3): g is square root transformation Binomial: arcsin( Y ) Note: transformation for normality may not be the same as the variance stabilizing transformation; boxcox assumes mean function is correct Generalized Linear Models preferable

42 Nonlinear Models Drug concentration of caldralazine at time X i in a cardiac failure patient given a single 30mg dose (D = 30) given by

43 Nonlinear Models Drug concentration of caldralazine at time X i in a cardiac failure patient given a single 30mg dose (D = 30) given by [ ] D µ(β) = V exp( κ ex i ) with β = (V, κ e ) V = volume and κ e is the elimination rate

44 Nonlinear Models Drug concentration of caldralazine at time X i in a cardiac failure patient given a single 30mg dose (D = 30) given by [ ] D µ(β) = V exp( κ ex i ) with β = (V, κ e ) V = volume and κ e is the elimination rate iid If log(y i ) = log(µ(β)) + ɛ i with ɛ i N(0, σ 2 ) then the model is intrinisically linear (can transform to linear model) [ ] D log(µ(β)) = log V exp( κ ex i ) = log[d] log(v ) κ e x i log(y i ) log[30] = β 0 + β 1 x i + ɛ i

45 Nonlinear Models Drug concentration of caldralazine at time X i in a cardiac failure patient given a single 30mg dose (D = 30) given by [ ] D µ(β) = V exp( κ ex i ) with β = (V, κ e ) V = volume and κ e is the elimination rate iid If log(y i ) = log(µ(β)) + ɛ i with ɛ i N(0, σ 2 ) then the model is intrinisically linear (can transform to linear model) [ ] D log(µ(β)) = log V exp( κ ex i ) = log[d] log(v ) κ e x i log(y i ) log[30] = β 0 + β 1 x i + ɛ i

46 Nonlinear Least Squares > conc.nlm = nls( log(y) ~ log((30/v)*exp(-k*x)), data=df, start=list(v=vhat, k=khat)) > summary(conc.nlm) Formula: log(y) ~ log((30/v) * exp(-k * x)) Parameters: Estimate Std. Error t value Pr(> t ) V.(Intercept) k.x *** Residual standard error: on 6 degrees of freedom Number of iterations to convergence: 0 Achieved convergence tolerance: 3.978e-09

47 Additive Errors under multiplicative log normal errors model is equivalent to linear model

48 Additive Errors under multiplicative log normal errors model is equivalent to linear model with additive Gaussian errors (or other distributions) model is intrinsically nonlinear - nonlinear least squares (or posterior sampling)

49 Additive Errors under multiplicative log normal errors model is equivalent to linear model with additive Gaussian errors (or other distributions) model is intrinsically nonlinear - nonlinear least squares (or posterior sampling) Y i = (30/V ) exp( k x i ) + ɛ i

50 Additive Errors under multiplicative log normal errors model is equivalent to linear model with additive Gaussian errors (or other distributions) model is intrinsically nonlinear - nonlinear least squares (or posterior sampling) Y i = (30/V ) exp( k x i ) + ɛ i ɛ i iid N(0, σ 2 )

51 Intrinsically Nonlinear Model > summary(conc.nlm) Formula: y ~ (30/V) * exp(-k * x) Parameters: Estimate Std. Error t value Pr(> t ) V e-07 *** k e-06 *** --- Residual standard error: on 6 degrees of freedom Number of iterations to convergence: 4 Achieved convergence tolerance: 7.698e-06

52 Fitted Values & Residuals y Log Normal Normal y exp(predict(logconc.nlm)) Log Normal Normal x exp(predict(logconc.nlm))

53 Functions of Interest Interest is in clearance: V κ e

54 Functions of Interest Interest is in clearance: V κ e elimination half-life x 1/2 = log 2/κ e

55 Functions of Interest Interest is in clearance: V κ e elimination half-life x 1/2 = log 2/κ e ) Use properties of MLEs: asymptotically ˆβ N (β, I (ˆβ) 1

56 Functions of Interest Interest is in clearance: V κ e elimination half-life x 1/2 = log 2/κ e ) Use properties of MLEs: asymptotically ˆβ N (β, I (ˆβ) 1 (Multivariate) Delta Method for transformations

57 Functions of Interest Interest is in clearance: V κ e elimination half-life x 1/2 = log 2/κ e ) Use properties of MLEs: asymptotically ˆβ N (β, I (ˆβ) 1 (Multivariate) Delta Method for transformations Asymptotic Distributions Bayes obtain the posterior directly for parameters and functions of parameters! Priors? Constraints on Distributions?

58 Summary Optimal transformation for normality (MLE) depends on choice of mean function

59 Summary Optimal transformation for normality (MLE) depends on choice of mean function May not be the same as the variance stabilizing transformation

60 Summary Optimal transformation for normality (MLE) depends on choice of mean function May not be the same as the variance stabilizing transformation Nonlinear Models as suggested by Theory or Generalized Linear Models are alternatives

61 Summary Optimal transformation for normality (MLE) depends on choice of mean function May not be the same as the variance stabilizing transformation Nonlinear Models as suggested by Theory or Generalized Linear Models are alternatives normal estimates may be useful approximations for large p or for starting values for more complex models (where convergence may be sensitive to starting values)

Transformations. Merlise Clyde. Readings: Gelman & Hill Ch 2-4, ALR 8-9

Transformations. Merlise Clyde. Readings: Gelman & Hill Ch 2-4, ALR 8-9 Transformations Merlise Clyde Readings: Gelman & Hill Ch 2-4, ALR 8-9 Assumptions of Linear Regression Y i = β 0 + β 1 X i1 + β 2 X i2 +... β p X ip + ɛ i Model Linear in X j but X j could be a transformation