Distribution Assumptions

Size: px
Start display at page:

Download "Distribution Assumptions"

Transcription

1 Merlise Clyde Duke University November 22, 2016

2 Outline Topics Normality & Transformations Box-Cox Nonlinear Regression Readings: Christensen Chapter 13 & Wakefield Chapter 6

3 Linear Model Linear Model again: Y = µ + ɛ

4 Linear Model Linear Model again: Assumptions: Y = µ + ɛ

5 Linear Model Linear Model again: Assumptions: Y = µ + ɛ µ C(X) µ = Xβ

6 Linear Model Linear Model again: Y = µ + ɛ Assumptions: µ C(X) µ = Xβ ɛ N(0 n, σ 2 I n )

7 Linear Model Linear Model again: Y = µ + ɛ Assumptions: µ C(X) µ = Xβ ɛ N(0 n, σ 2 I n ) Normal Distribution for ɛ with constant variance Outlier Models Robustify with heavy tailed error distributions Computational Advantages of Normal Models

8 Normality Recall e = (I P X )Y 1 independent but not identically distributed

9 Normality Recall e = (I P X )Y = (I P X )(Xˆβ + ɛ) 1 independent but not identically distributed

10 Normality Recall e = (I P X )Y = (I P X )(Xˆβ + ɛ) = (I P X )ɛ 1 independent but not identically distributed

11 Normality Recall e = (I P X )Y = (I P X )(Xˆβ + ɛ) = (I P X )ɛ n e i = ɛ i h ij ɛ j j=1 1 independent but not identically distributed

12 Normality Recall e = (I P X )Y = (I P X )(Xˆβ + ɛ) = (I P X )ɛ e i = ɛ i n h ij ɛ j Lyapunov CLT 1 implies that residuals will be approximately normal (even for modest n), if the errors are not normal j=1 1 independent but not identically distributed

13 Normality Recall e = (I P X )Y = (I P X )(Xˆβ + ɛ) = (I P X )ɛ e i = ɛ i n h ij ɛ j Lyapunov CLT 1 implies that residuals will be approximately normal (even for modest n), if the errors are not normal j=1 Supernormality of residuals 1 independent but not identically distributed

14 Q-Q Plots Order e i : e (1) e (2)... e (n) sample order statistics or sample quantiles

15 Q-Q Plots Order e i : e (1) e (2)... e (n) sample order statistics or sample quantiles Let z (1) z (2)... z (n) denote the expected order statistics of a sample of size n from a standard normal distribution theoretical quantiles

16 Q-Q Plots Order e i : e (1) e (2)... e (n) sample order statistics or sample quantiles Let z (1) z (2)... z (n) denote the expected order statistics of a sample of size n from a standard normal distribution theoretical quantiles If the e i are normal then E[e (i) ] = σz (i)

17 Q-Q Plots Order e i : e (1) e (2)... e (n) sample order statistics or sample quantiles Let z (1) z (2)... z (n) denote the expected order statistics of a sample of size n from a standard normal distribution theoretical quantiles If the e i are normal then E[e (i) ] = σz (i) Expect that points in a scatter plot of e (i) and z (i) should be on a straight line.

18 Q-Q Plots Order e i : e (1) e (2)... e (n) sample order statistics or sample quantiles Let z (1) z (2)... z (n) denote the expected order statistics of a sample of size n from a standard normal distribution theoretical quantiles If the e i are normal then E[e (i) ] = σz (i) Expect that points in a scatter plot of e (i) and z (i) should be on a straight line. Judgment call - use simulations to gain experience!

19 Animal Example Original Units Brain Weight (g) Body Weight (kg)

20 Residual Plots Residuals Residuals vs Fitted Fitted values African elepha Asian elephant Human Standardized residuals Asian elephant Brachiosaurus Normal Q Q African elephant Theoretical Quantiles Standardized residuals Brachiosaurus Scale Location African elephant Asian elephant Standardized residuals African elephant Asian elephant Residuals vs Leverage Cook's distance Brachiosaurus Fitted values Leverage

21 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0

22 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0 { (Y λ U(Y, λ) = Y (λ) 1) = λ λ 0 log(y ) λ = 0

23 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0 { (Y λ U(Y, λ) = Y (λ) 1) = λ λ 0 log(y ) λ = 0 Estimate λ by maximum Likelihood

24 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0 { (Y λ U(Y, λ) = Y (λ) 1) = λ λ 0 log(y ) λ = 0 Estimate λ by maximum Likelihood U(Y, λ) = Y (λ) N(Xβ, σ 2 ) L(λ, β, σ 2 ) f (y i λ, β, σ 2 )

25 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0 { (Y λ U(Y, λ) = Y (λ) 1) = λ λ 0 log(y ) λ = 0 Estimate λ by maximum Likelihood L(λ, β, σ 2 ) f (y i λ, β, σ 2 ) U(Y, λ) = Y (λ) N(Xβ, σ 2 ) Jacobian term is i y λ 1 i for all λ

26 Box-Cox Transformation Box and Cox (1964) suggested a family of power transformations for Y > 0 { (Y λ U(Y, λ) = Y (λ) 1) = λ λ 0 log(y ) λ = 0 Estimate λ by maximum Likelihood L(λ, β, σ 2 ) f (y i λ, β, σ 2 ) U(Y, λ) = Y (λ) N(Xβ, σ 2 ) Jacobian term is i y λ 1 i for all λ Profile Likelihood based on substituting MLE β and σ 2 for each value of λ is log(l(λ) (λ 1) i log(y i ) n 2 log(sse(λ))

27 Profile Likelihood 95% log Likelihood λ

28 Residuals After Transformation of Response Residuals vs Fitted Normal Q Q Residuals African elephant Golden hamster Mouse Standardized residuals Mouse Golden hamster African elephant Fitted values Theoretical Quantiles Standardized residuals Mouse African elephant Golden hamster Scale Location Standardized residuals Residuals vs Leverage Golden hamster MouseCook's distance Brachiosaurus Fitted values Leverage

29 Residuals After Transformation of Both Residuals vs Fitted Normal Q Q Residuals Brachiosaurus Triceratops Dipliodocus Standardized residuals Triceratops Brachiosaurus Dipliodocus Fitted values Theoretical Quantiles Standardized residuals Scale Location Brachiosaurus Dipliodocus Triceratops Standardized residuals Residuals vs Leverage 0.5 Triceratops 0.5 Cook's distance Dipliodocus Brachiosaurus Fitted values Leverage

30 Transformed Data Logarithmic Scale Brain Weight (g) 5e 01 5e+00 5e+01 5e+02 5e+03 Triceratops Dipliodocus Brachiosau 1e 01 1e+01 1e+03 1e+05 Body Weight (kg)

31 Test that Dinos are Outliers Res.Df RSS Df Sum of Sq F Pr(>F)

32 Test that Dinos are Outliers Res.Df RSS Df Sum of Sq F Pr(>F) Estimate Std. Error t value Pr(> t ) (Intercept) log(body) Triceratops Brachiosaurus Dipliodocus

33 Test that Dinos are Outliers Res.Df RSS Df Sum of Sq F Pr(>F) Estimate Std. Error t value Pr(> t ) (Intercept) log(body) Triceratops Brachiosaurus Dipliodocus Dinosaurs come from a different population from mammals

34 Model Selection Priors brains.bas = bas.lm(log(brain) ~ log(body) + diag(28), data=animals, prior="hyper-g-n", a=3, modelprior=beta.binomial(1,28), method="mcmc", n.models=2^17, MCMC.it=2^18) # check for convergence plot(brains.bas$probne0, brains.bas$probs.mcmc) brains.bas$probs.mcmc brains.bas$probne0

35 image(brains.bas) Intercept log(body) diag(28)1 diag(28)2 diag(28)3 diag(28)4 diag(28)5 diag(28)6 diag(28)7 diag(28)8 diag(28)9 diag(28)10 diag(28)11 diag(28)12 diag(28)13 diag(28)14 diag(28)15 diag(28)16 diag(28)17 diag(28)18 diag(28)19 diag(28)20 diag(28)21 diag(28)22 diag(28)23 diag(28)24 diag(28)25 diag(28)26 diag(28)27 diag(28)28 Log Posterior Odds Model Rank rownames(animals)[c(6, 14, 16, 26)] "Dipliodocus" "Human" "Triceratops" "Brachiosaurus"

36 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ))

37 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ)) Delta Method implies that g(y ) N(g(µ), g (µ) 2 h(µ)

38 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ)) Delta Method implies that g(y ) N(g(µ), g (µ) 2 h(µ) Find function g such that g (µ) 2 /h(µ) is constant g(y ) N(g(µ), c)

39 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ)) Delta Method implies that g(y ) N(g(µ), g (µ) 2 h(µ) Find function g such that g (µ) 2 /h(µ) is constant g(y ) N(g(µ), c) Poisson Counts (Y > 3): g is square root transformation

40 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ)) Delta Method implies that g(y ) N(g(µ), g (µ) 2 h(µ) Find function g such that g (µ) 2 /h(µ) is constant g(y ) N(g(µ), c) Poisson Counts (Y > 3): g is square root transformation Binomial: arcsin( Y ) Note: transformation for normality may not be the same as the variance stabilizing transformation; boxcox assumes mean function is correct

41 Variance Stabilizing Transformations If Y µ (approximately) N(0, h(µ)) Delta Method implies that g(y ) N(g(µ), g (µ) 2 h(µ) Find function g such that g (µ) 2 /h(µ) is constant g(y ) N(g(µ), c) Poisson Counts (Y > 3): g is square root transformation Binomial: arcsin( Y ) Note: transformation for normality may not be the same as the variance stabilizing transformation; boxcox assumes mean function is correct Generalized Linear Models preferable

42 Nonlinear Models Drug concentration of caldralazine at time X i in a cardiac failure patient given a single 30mg dose (D = 30) given by

43 Nonlinear Models Drug concentration of caldralazine at time X i in a cardiac failure patient given a single 30mg dose (D = 30) given by [ ] D µ(β) = V exp( κ ex i ) with β = (V, κ e ) V = volume and κ e is the elimination rate

44 Nonlinear Models Drug concentration of caldralazine at time X i in a cardiac failure patient given a single 30mg dose (D = 30) given by [ ] D µ(β) = V exp( κ ex i ) with β = (V, κ e ) V = volume and κ e is the elimination rate iid If log(y i ) = log(µ(β)) + ɛ i with ɛ i N(0, σ 2 ) then the model is intrinisically linear (can transform to linear model) [ ] D log(µ(β)) = log V exp( κ ex i ) = log[d] log(v ) κ e x i log(y i ) log[30] = β 0 + β 1 x i + ɛ i

45 Nonlinear Models Drug concentration of caldralazine at time X i in a cardiac failure patient given a single 30mg dose (D = 30) given by [ ] D µ(β) = V exp( κ ex i ) with β = (V, κ e ) V = volume and κ e is the elimination rate iid If log(y i ) = log(µ(β)) + ɛ i with ɛ i N(0, σ 2 ) then the model is intrinisically linear (can transform to linear model) [ ] D log(µ(β)) = log V exp( κ ex i ) = log[d] log(v ) κ e x i log(y i ) log[30] = β 0 + β 1 x i + ɛ i

46 Nonlinear Least Squares > conc.nlm = nls( log(y) ~ log((30/v)*exp(-k*x)), data=df, start=list(v=vhat, k=khat)) > summary(conc.nlm) Formula: log(y) ~ log((30/v) * exp(-k * x)) Parameters: Estimate Std. Error t value Pr(> t ) V.(Intercept) k.x *** Residual standard error: on 6 degrees of freedom Number of iterations to convergence: 0 Achieved convergence tolerance: 3.978e-09

47 Additive Errors under multiplicative log normal errors model is equivalent to linear model

48 Additive Errors under multiplicative log normal errors model is equivalent to linear model with additive Gaussian errors (or other distributions) model is intrinsically nonlinear - nonlinear least squares (or posterior sampling)

49 Additive Errors under multiplicative log normal errors model is equivalent to linear model with additive Gaussian errors (or other distributions) model is intrinsically nonlinear - nonlinear least squares (or posterior sampling) Y i = (30/V ) exp( k x i ) + ɛ i

50 Additive Errors under multiplicative log normal errors model is equivalent to linear model with additive Gaussian errors (or other distributions) model is intrinsically nonlinear - nonlinear least squares (or posterior sampling) Y i = (30/V ) exp( k x i ) + ɛ i ɛ i iid N(0, σ 2 )

51 Intrinsically Nonlinear Model > summary(conc.nlm) Formula: y ~ (30/V) * exp(-k * x) Parameters: Estimate Std. Error t value Pr(> t ) V e-07 *** k e-06 *** --- Residual standard error: on 6 degrees of freedom Number of iterations to convergence: 4 Achieved convergence tolerance: 7.698e-06

52 Fitted Values & Residuals y Log Normal Normal y exp(predict(logconc.nlm)) Log Normal Normal x exp(predict(logconc.nlm))

53 Functions of Interest Interest is in clearance: V κ e

54 Functions of Interest Interest is in clearance: V κ e elimination half-life x 1/2 = log 2/κ e

55 Functions of Interest Interest is in clearance: V κ e elimination half-life x 1/2 = log 2/κ e ) Use properties of MLEs: asymptotically ˆβ N (β, I (ˆβ) 1

56 Functions of Interest Interest is in clearance: V κ e elimination half-life x 1/2 = log 2/κ e ) Use properties of MLEs: asymptotically ˆβ N (β, I (ˆβ) 1 (Multivariate) Delta Method for transformations

57 Functions of Interest Interest is in clearance: V κ e elimination half-life x 1/2 = log 2/κ e ) Use properties of MLEs: asymptotically ˆβ N (β, I (ˆβ) 1 (Multivariate) Delta Method for transformations Asymptotic Distributions Bayes obtain the posterior directly for parameters and functions of parameters! Priors? Constraints on Distributions?

58 Summary Optimal transformation for normality (MLE) depends on choice of mean function

59 Summary Optimal transformation for normality (MLE) depends on choice of mean function May not be the same as the variance stabilizing transformation

60 Summary Optimal transformation for normality (MLE) depends on choice of mean function May not be the same as the variance stabilizing transformation Nonlinear Models as suggested by Theory or Generalized Linear Models are alternatives

61 Summary Optimal transformation for normality (MLE) depends on choice of mean function May not be the same as the variance stabilizing transformation Nonlinear Models as suggested by Theory or Generalized Linear Models are alternatives normal estimates may be useful approximations for large p or for starting values for more complex models (where convergence may be sensitive to starting values)

Transformations. Merlise Clyde. Readings: Gelman & Hill Ch 2-4, ALR 8-9

Transformations. Merlise Clyde. Readings: Gelman & Hill Ch 2-4, ALR 8-9 Transformations Merlise Clyde Readings: Gelman & Hill Ch 2-4, ALR 8-9 Assumptions of Linear Regression Y i = β 0 + β 1 X i1 + β 2 X i2 +... β p X ip + ɛ i Model Linear in X j but X j could be a transformation

More information

Transformations. Merlise Clyde. Readings: Gelman & Hill Ch 2-4

Transformations. Merlise Clyde. Readings: Gelman & Hill Ch 2-4 Transformations Merlise Clyde Readings: Gelman & Hill Ch 2-4 Assumptions of Linear Regression Y i = β 0 + β 1 X i1 + β 2 X i2 +... β p X ip + ɛ i Model Linear in X j but X j could be a transformation of

More information

Interpretation, Prediction and Confidence Intervals

Interpretation, Prediction and Confidence Intervals Interpretation, Prediction and Confidence Intervals Merlise Clyde September 15, 2017 Last Class Model for log brain weight as a function of log body weight Nested Model Comparison using ANOVA led to model

More information

STAT 420: Methods of Applied Statistics

STAT 420: Methods of Applied Statistics STAT 420: Methods of Applied Statistics Model Diagnostics Transformation Shiwei Lan, Ph.D. Course website: http://shiwei.stat.illinois.edu/lectures/stat420.html August 15, 2018 Department

More information

Gauss Markov & Predictive Distributions

Gauss Markov & Predictive Distributions Gauss Markov & Predictive Distributions Merlise Clyde STA721 Linear Models Duke University September 14, 2017 Outline Topics Gauss-Markov Theorem Estimability and Prediction Readings: Christensen Chapter

More information

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 3, 2015 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);

More information

Example: Suppose Y has a Poisson distribution with mean

Example: Suppose Y has a Poisson distribution with mean Transformations A variance stabilizing transformation may be useful when the variance of y appears to depend on the value of the regressor variables, or on the mean of y. Table 5.1 lists some commonly

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example The Big Picture Remedies after Model Diagnostics The Big Picture Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Residual plots

More information

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007 Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Statistics 572 (Spring 2007) Model Modifications February 6, 2007 1 / 20 The Big

More information

1. Variance stabilizing transformations; Box-Cox Transformations - Section. 2. Transformations to linearize the model - Section 5.

1. Variance stabilizing transformations; Box-Cox Transformations - Section. 2. Transformations to linearize the model - Section 5. Ch. 5: Transformations and Weighting 1. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; 5.4 2. Transformations to linearize the model - Section 5.3 3. Weighted regression -

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Linear Model Specification in R

Linear Model Specification in R Linear Model Specification in R How to deal with overparameterisation? Paul Janssen 1 Luc Duchateau 2 1 Center for Statistics Hasselt University, Belgium 2 Faculty of Veterinary Medicine Ghent University,

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Readings Chapter 14 Christensen Merlise Clyde September 29, 2015 How Good are Estimators? Quadratic loss for estimating β using estimator a L(β, a) = (β a) T (β a) How Good are Estimators? Quadratic loss

More information

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as: 1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation

More information

Ch. 5 Transformations and Weighting

Ch. 5 Transformations and Weighting Outline Three approaches: Ch. 5 Transformations and Weighting. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; 5.4 2. Transformations to linearize the model - Section 5.3 3.

More information

Lecture 16: Again on Regression

Lecture 16: Again on Regression Lecture 16: Again on Regression S. Massa, Department of Statistics, University of Oxford 10 February 2016 The Normality Assumption Body weights (Kg) and brain weights (Kg) of 62 mammals. Species Body weight

More information

robmixglm: An R package for robust analysis using mixtures

robmixglm: An R package for robust analysis using mixtures robmixglm: An R package for robust analysis using mixtures 1 Introduction 1.1 Model Ken Beath Macquarie University Australia Package robmixglm implements the method of Beath (2017). This assumes that data

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017 Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Reaction Days

Reaction Days Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days)

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

PHAR2821 Drug Discovery and Design B Statistics GARTH TARR SEMESTER 2, 2013

PHAR2821 Drug Discovery and Design B Statistics GARTH TARR SEMESTER 2, 2013 PHAR2821 Drug Discovery and Design B Statistics GARTH TARR SEMESTER 2, 2013 Housekeeping Contact details Email: garth.tarr@sydney.edu.au Room: 806 Carslaw Building Consultation: by appointment (email to

More information

Multiple Predictor Variables: ANOVA

Multiple Predictor Variables: ANOVA What if you manipulate two factors? Multiple Predictor Variables: ANOVA Block 1 Block 2 Block 3 Block 4 A B C D B C D A C D A B D A B C Randomized Controlled Blocked Design: Design where each treatment

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

STAT 571A Advanced Statistical Regression Analysis. Chapter 3 NOTES Diagnostics and Remedial Measures

STAT 571A Advanced Statistical Regression Analysis. Chapter 3 NOTES Diagnostics and Remedial Measures STAT 571A Advanced Statistical Regression Analysis Chapter 3 NOTES Diagnostics and Remedial Measures 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous rights exist.

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

NC Births, ANOVA & F-tests

NC Births, ANOVA & F-tests Math 158, Spring 2018 Jo Hardin Multiple Regression II R code Decomposition of Sums of Squares (and F-tests) NC Births, ANOVA & F-tests A description of the data is given at http://pages.pomona.edu/~jsh04747/courses/math58/

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Diagnostics and Remedial Measures: An Overview

Diagnostics and Remedial Measures: An Overview Diagnostics and Remedial Measures: An Overview Residuals Model diagnostics Graphical techniques Hypothesis testing Remedial measures Transformation Later: more about all this for multiple regression W.

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017 Poisson Regression Gelman & Hill Chapter 6 February 6, 2017 Military Coups Background: Sub-Sahara Africa has experienced a high proportion of regime changes due to military takeover of governments for

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

Bayesian Model Diagnostics and Checking

Bayesian Model Diagnostics and Checking Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Statistics - Lecture Three. Linear Models. Charlotte Wickham   1. Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Second Midterm Exam Name: Solutions March 19, 2014

Second Midterm Exam Name: Solutions March 19, 2014 Math 3080 1. Treibergs σιι Second Midterm Exam Name: Solutions March 19, 2014 (1. The article Withdrawl Strength of Threaded Nails, in Journal of Structural Engineering, 2001, describes an experiment to

More information

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression 1/37 The linear regression model Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression Ken Rice Department of Biostatistics University of Washington 2/37 The linear regression model

More information

Linear regression. Example 3.6.1: Relation between abrasion loss and hardness of rubber tires.

Linear regression. Example 3.6.1: Relation between abrasion loss and hardness of rubber tires. Linear regression Example 3.6.1: Relation between abrasion loss and hardness of rubber tires. X i the abrasion loss 30 observations, i = 1,..., 30. Sample space R 30. y i the hardness, i = 1,..., 30. Model:

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

36-707: Regression Analysis Homework Solutions. Homework 3

36-707: Regression Analysis Homework Solutions. Homework 3 36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

SB1a Applied Statistics Lectures 9-10

SB1a Applied Statistics Lectures 9-10 SB1a Applied Statistics Lectures 9-10 Dr Geoff Nicholls Week 5 MT15 - Natural or canonical) exponential families - Generalised Linear Models for data - Fitting GLM s to data MLE s Iteratively Re-weighted

More information

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form Outline Statistical inference for linear mixed models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark general form of linear mixed models examples of analyses using linear mixed

More information

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison. Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Statistics for EES 7. Linear regression and linear models

Statistics for EES 7. Linear regression and linear models Statistics for EES 7. Linear regression and linear models Dirk Metzler http://www.zi.biologie.uni-muenchen.de/evol/statgen.html 26. May 2009 Contents 1 Univariate linear regression: how and why? 2 t-test

More information

Regression Model Building

Regression Model Building Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

More information

Nonlinear Regression

Nonlinear Regression Nonlinear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Nonlinear Regression 1 / 36 Nonlinear Regression 1 Introduction

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

5.2 Expounding on the Admissibility of Shrinkage Estimators

5.2 Expounding on the Admissibility of Shrinkage Estimators STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Transformations. 7.3 Extensions of Simple Linear Regression. β > 1 0 < β < 1. β < 0. Power Laws: Y = α X β X 4 X 1/2 X 1/3 X 3 X 1/4 X 2 X 1 X 1/2

Transformations. 7.3 Extensions of Simple Linear Regression. β > 1 0 < β < 1. β < 0. Power Laws: Y = α X β X 4 X 1/2 X 1/3 X 3 X 1/4 X 2 X 1 X 1/2 Ismor Fischer, 5/29/202 7.3-7.3 Extensions of Simple Linear Regression Transformations Power Laws: Y = α X β β > 0 < β < X 4 X /2 X /3 X 3 X 2 X /4 β < 0 X 2 X X /2 0.0 Ismor Fischer, 5/29/202 7.3-2 If

More information

Chapter 7: Variances. October 14, In this chapter we consider a variety of extensions to the linear model that allow for more gen-

Chapter 7: Variances. October 14, In this chapter we consider a variety of extensions to the linear model that allow for more gen- Chapter 7: Variances October 14, 2018 In this chapter we consider a variety of extensions to the linear model that allow for more gen- eral variance structures than the independent, identically distributed

More information

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs Outline Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009,

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information