STAT5044: Regression and Anova
|
|
- Dinah Ross
- 5 years ago
- Views:
Transcription
1 STAT5044: Regression and Anova Inyoung Kim 1 / 49
2 Outline 1 How to check assumptions 2 / 49
3 Assumption Linearity: scatter plot, residual plot Randomness: Run test, Durbin-Watson test when the data can be arranged in time order Constant variance: scatter plot, residual plot (ABS-residual plot); Brown-Forsythe test, Breusch-Pagan Test Normality of error: Box-plot, histogram, normal probability plot; Shapiro-Wilks test, Kolmogorov-Smirnov, Anderson-Darling Remark: Normality probability plot provides no information if the assumption of linearity and/or constant variance are violated 3 / 49
4 Influential point Combination of large absolute residual and high leverage (h ii ) Leverage: diagonal value of Hat matrix (H) h 11 h 12 h 1n h 21 h 22 h 1n H = h n1 h n2 h nn High leverage large h ii 4 / 49
5 Residual Three types: Ordinary r: r i = y ŷ, where E(r i ) = 0 and var(r i ) = (1 h ii )σ 2 Standardized: r i ˆσ 1 h ii Studendized (or Jackknife): where, ˆσ 2 (i) = ( j r 2 r i ˆσ (i) 1 hii t n 2 j(i) )/(n p 1) and (p+1) is the number of parameter h ii is the leverage which is the diagonal value of Hat matrix r j(i) = y j ŷ j(i) = y j ( ˆβ 0(i) + ˆβ 1(i) x j ) 5 / 49
6 Properties of residuals Sum to zero: r i = 0 Are not independent 6 / 49
7 Residual Jackknife σ 2 r i(i) = y i ŷ i(i) N(0, ) 1 h ii where the subindex (i) indicate that estimate without point i residual for y i computed using regression without y i then scaling Studendized residual: r i(i) var(r ˆ i(i) ) r i(i) = y i ŷ i(i) = y i [ ˆβ 0(i) + ˆβ 1(i) x i ] 7 / 49
8 Studendized residual r i(i) = var(r ˆ i(i) ) r i ˆσ (i) 1 hii by Fact 1 and 2 Fact 1: r i(i) = r i 1 h ii Fact 2: ri(i) 2 = (n p) ˆσ 2 r i 2 1 h ii ˆσ (i) = (n p) ˆσ 2 r 2 i 1 h ii n p 1 8 / 49
9 Residual Using Fact1 r i(i) = r i 1 h ii, we have Var(r i(i) ) = Var(r i) (1 h ii ) 2 = σ 2 1 h ii But σ 2 is unknown We use ˆσ 2 (i) r i(i) = Y i Ŷ i(i) r i(i) = r i 1 h ii 9 / 49
10 Residual Studendized residual r i 1 h ii = ˆσ (i) 2 1 h ii r i ˆσ 2 (i) (1 h ii ) where ˆσ 2 (i) = j r 2 j(i) n p 1 rj(i) 2 = (n p) ˆσ 2 r i 2 j 1 h ii NOTE: large residual if r j(i) > 3 An expression for the distribution of the standardized residuals was obtained (Weisberg, 1985) 10 / 49
11 Studendized residual r i(i) = var(r ˆ i(i) ) r i ˆσ (i) 1 hii t n p 1 You don t need to know how to prove this in our class! (beyond our class scope) 11 / 49
12 Comparison with standardized residual Standardized residual: r i 0 var(ri ) = r i 0 σ 2 (1 h ii ) r i (1 hii ) ˆσ 2 If one has outliers with large absolute residual, then ˆσ 2 may not be a good measurement Residuals are not independent and have different variances The distribution of the standardized residual is not a t distribution People usually ignore these problems 12 / 49
13 Residual plots in R > lmfit<-lm(y x) > plot(fitted(lmfit),residuals(lmfit),xlab= Fitted,ylab= Residuals ) > abline(h=0) > plot(fitted(lmfit),abs(residuals(lmfit)),xlab="fitted",ylab=" Residuals 13 / 49
14 Residual plots Residual plots Residuals Residuals Fitted Fitted 14 / 49
15 Leverage H = X(X t X) 1 X t Let x t i = ( ( ) x t 1 ) 1 1 x i, xi =, X =, A = (X t X) 1 x i H n n = XAX t = The (i,j)th element of H is x t i Ax j NOTE: A = (X t X) 1 = x t 1 x t n x t n A ( x 1 x 2 x n ) ( 1 + x 2 n x S xx S xx x 1 S xx S xx ) 15 / 49
16 Leverage The (i,j)th element of H is (1 x i )(X t X) 1 ( 1 x j ), h ii is level of matrix h ii = 1 n + (x i x) 2 S xx (Check ) high level point: h ii is large, that is (x i x) 2 is large 1 n h ii 1 Idea: If this is regular and n is large (n ) h ii = 1 n + (x i x) 2 (x i x) 2 O( 1 n ) 0 16 / 49
17 Why is leverage in this range? j h 2 ji = h ii 0 j i h 2 ji = h ii h 2 ii Hence, 0 h ii (1 h ii ) Since h ii > 0 and 1 h ii 0, h ii 1 We also know that h ii > 1 n because of h ii = 1 n + (x i x) 2 S xx 1 n 17 / 49
18 Cook s distance Measure influential points using ŷ i ŷ i(j), j is fixed point ŷ 1(i) ŷ 2(i) ŷ (i) = ŷ n(i) where the subindex (i) indicates the fitted values are obtained using all observations except ith observation The ith cook s distance: D i = {ŷ ŷ (i)} t {ŷ ŷ (i) } p ˆσ 2 where ŷ = X ˆβ, ŷ (i) = X ˆβ (i) 18 / 49
19 Cook s distance D i = { ˆβ ˆβ (i) } t X t X{ ˆβ ˆβ (i) }/p ˆσ 2 F p,n p Identify the points which have relatively large cook distance by Fact3: ˆβ ˆβ (i) = D i = ( r i 1 h ii ) 2 x t i (X t X) 1 (X t X)(X t X) 1 x i p ˆσ 2 r i 1 h ii (X t X) 1 x i 19 / 49
20 Cook s distance D i depends on two factors: D i = ( r i 1 h ii ) 2 x t i (X t X) 1 (X t X)(X t X) 1 x i p ˆσ 2 The size of the residual r i The leverage value h ii The larger either r i or h ii is, the larger D i The ith case can be influential: (1) by having a larger residuals and only a moderate leverage value h ii or (2) by having a larger leverage value h ii with only a moderately sized residuals or (3) by having both a larger residual and a large leverage value 20 / 49
21 Cooks distance in R libray(stats) #<---for cooksdistance libray(faraway) #<--halfnorm lmfit<-lm(y x) cook<-cooksdistance(lmfit) par(mfcol=c(1,2)) halfnorm(cook,3,ylab="cooks dist") boxplot(cook) 21 / 49
22 Cooks distance in R Cooks distance Cooks dist Half normal quantiles 22 / 49
23 Randomness: runs test and Durbin Wason test runs test Order the residuals Count the number of runs (r), the numbers of positive and negative residuals, let s say n 1 and n 2 If n 1 20, n 2 20, reject the hypothesis of randomness if r < r L or if r > r U, where r L and r U are the upper and lower critical values given Table A30 (handout) For large sample size, reject hypothesis of randomness if z > z α/2, where z = r µ 05 σ where µ = 1 + 2n 1n 2 n 1 +n 2, σ 2 = 2n 1n 2 (2n 1 n 2 n 1 n 2 ) (n 1 +n 2 ) 2 (n 1 +n 2 1) 23 / 49
24 Example of Randomness: runs test > x<-c(0:9) > y<-c(98, 135, 162,178, 221,232,283,300,374,395) > lmfit<-lm(y x) > residuals(lmfit) / 49
25 Example of Randomness: runs test How to do run test? Run test: (+ + +) ( ) (+ +) the num of run=3 the num of positive=5 the num of negative=5 Using Table A30 rl=2 and ru=10 If r<rl or r>ru, reject the hypothesis of randomness 25 / 49
26 run rest in R library(lawstat) lmfit<-lm(y x) runstest(residuals(lmfit)) > runstest(residuals(lmfit)) Runs Test - Two sided data: residuals(lmfit) Standardized Runs Statistic = , p-value = / 49
27 Randomness: Durbin Wason test Durbin Wason test: to test error terms ε i are independent (H 0 : ρ = 0) Test statistic D is D = n t=2(r t r t 1 ) 2 n t=1 r 2 t where r t = Y t Ŷ t If D > d U, conclude H 0 If D < d L, conclude H a If d L < D < d U, test is inconclusive d L and d U are selected based on level of testing, the number of X variables (p 1), sample size (n) 27 / 49
28 DW test in R > lmfit<-lm(y x) > dwtest(lmfit) Durbin-Watson test data: lmfit DW = 1875, p-value = alternative hypothesis: true autocorrelation is greater than 0 28 / 49
29 Constant variance: Brown-Forsythe and Breusch-Pagan test Brown-Forsythe (Levene test) r i1, r i2 : the ith residual for group1 and group2 n 1, n 2 : the sample size of each group r 1, r 2 : the median of each group d i1 = r i1 r 1, d i2 = r i2 r 2 Two-sample t test statistic is where s 2 = (d i1 d1 ) 2 + (d i2 d2 ) 2 n 2 Breusch-Pagan to test H 0 : γ 1 = 0 d 1 d2 t BF = s 1/n 1 + 1/n 2 log e σi 2 = γ 0 + γ 1 X i Test statistic is X 2 BP = SSR /2 (SSE/n) 2 where SSR : regression sum of squares when regressing r 2 on X and SSE is the error sum of squares when regression Y on X 29 / 49
30 BF tests in R # best way to split two group is that one has low values and the other has large values of X g1<-c( , , , , ) g2<-c( , , , , ) d1<-abs(g1-median(g1)) d2<-abs(g2-median(g2)) ttest(d1,d2) Welch Two Sample t-test data: d1 and d2 t = , df = 711, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y / 49
31 BF tests in R library(lawstat) lmfit<-lm(y x) levenetest(residuals(lmfit),group) > levenetest(residuals(lmfit),group=c(rep(1,5),rep(0,5))) Classical Levene s test based on the absolute deviations from the mean data: residuals(lmfit) Test Statistic = 00708, p-value = / 49
32 BP tests in R library(lmtest) lmfit<-lm(y x) bptest(lmfit) > bptest(lmfit) studentized Breusch-Pagan test data: lmfit BP = 30628, df = 1, p-value = / 49
33 Test of normality Shapiro Wilk test: H 0 : a sample y 1,,y n cames from a normally distributed population Test statistic is W = ( a iy (i) ) 2 n i=1(y (i) ȳ) 2 where y (i) is the ith order statistics and the constant a i are given by m t V 1 (a 1,,a n ) = (m t V 1 V 1 m) 1/2 and m = (m 1,,m n ) t where m i is the expected values of the order statistics of iid random variables from standard normal dist and V is the covariance matrix of those order statistics If W is too small, reject the null hypothesis 33 / 49
34 Shapiro Wilks in R library(stats) lmfit<-lm(y x) Shapirotest(residuals(lmfit)) > shapirotest(residuals(lmfit)) Shapiro-Wilk normality test data: residuals(lmfit) W = 09073, p-value = / 49
35 Test of normality Kolmogorove-Smirnov: The empirical distribution function F n for n iid observations Y i is defined as F n (y) = 1 n n i=1 I(Y i < y) where I(Y y): indicator function The Kolmogorove-Smirnov statistic is If D n is big, reject the null D n = sup y F n (y) F(y) Correlation test: idea-compute the correlation between the expected quantile of normal and the observed order statistic Anderson-Darling test: A distance or empirical distribution test and use with small sample size n / 49
36 Anderson-Darling test in R library(nortest) adtest(residuals(lmfit)) > adtest(residuals(lmfit)) Anderson-Darling normality test data: residuals(lmfit) A = 04495, p-value = / 49
37 PP plot and QQplot Plots for comparing two probability distributions There are two basic types, the probability-probability plot and the quantile-quantile plot A plot of points whose coordinates are the cumulative probabilities {p x (q),p y (q)} for different values of q is a probaility-probability plot, A plot of the points whose coordinates are the quantiles {q x (p),q y (p)} for different values of p is a quantile-quantile plot The latter is the more frequently used of the two types and its use to investigate the assumption that a set of data is from a normal distribution For example, plotting the ordered sample values y 1,,y n against the quantiles of a standard normal distribution, Φ 1 [p (i) ] where p i = i 1 2 n Φ(x) = x 1 e 1 2 µ2 2π dµ This is usually known as a normal probability plot and 37 / 49
38 Normal QQ plot in R library(faraway) qqnorm(residuals(lmfit), ylab= Residuals ) qqline(residuals(lmfit)) 38 / 49
39 Normal QQplot in R Normal QQplot Normal Q-Q Plot Histogram of residuals(lmfit) Residuals Residuals Theoretical Quantiles residuals(lmfit) 39 / 49
40 Lack of fit test Idea: if you have multiple tests of y for x values, you can use these to test for lack of fit Basis: if the fit is good, the fitted line should go through the mean of y s at each x If the fit is bad, the fitted value should differ from the mean 40 / 49
41 Linear Lack of fit test This test assumes variance homogeneity Goal: check the linearity of the conditional mean of Y given X Requirements: one has to have replicates in X Data x 1 x 2 x k y 11 y 21 y k1 y 12 y 22 y k2 y 1n1 y 2n2 y knk Some of the n 1, n 2,,n k have to be > 1 41 / 49
42 Linear Lack of fit test Model y ij = β 0 + β 1 x i + ε ij, i = 1,,k, j = 1,2,,n k where ε ij [0,σ 2 ] Model y ij = β 0 + β 1 x i + σε ij, i = 1,,k, j = 1,2,,n k where ε ij [0,1] These are the same model 42 / 49
43 Linear Lack of fit test Model y ij = β 0 + β 1 x i + σε ij, i = 1,,k, j = 1,2,,n k where ε ij [0,1] How many total replicate? n 1 + n n k = n Remark1: independent, normally distributed error with a constant variance 43 / 49
44 Linear Lack of fit test y = y 11 y 1n1 y 21 y 2n2 y k1 y knk = 1 n1 x 1 1 n1 1 n2 x 2 1 n2 1 nk x k 1 nk ( β0 β 1 ) + ε 44 / 49
45 ANOVA table for Lack of fit test ANOVA model (Ŷ ij Ȳ ) 2 residual (Y ij Ŷ ij ) 2 Total (Y ij Ŷ ij ) 2 + (Ŷ ij Ȳ ) 2 SSE=SSPE+SSLOF Y ij Ŷ ij = (Y ij Ȳ i ) + (Ȳ i Ŷ ij ) SSPE: sum of squared pure errors= (Y ij Ȳ i ) 2 SSLOF=sum of square lack of fit = (Ŷ ij Ȳ i ) 2 H 0 : Linear model fit the data well H 1 : Linear model does not fit the data If SSLOF is large there is a lack of fit F = (Ŷ ij Ȳ i ) 2 /df 1 (Y ij Ȳ i ) 2 /df 2 F df 1,df 2(= SSLof SSPE ) reject H 0 if F > F df 1,df 2 for a 1 α level test 45 / 49
46 Degree of freedom in ANOVA Find df1 and df2 Think about an example of two populations (We used pooled sample variance) S 2 p = (Y 1j ȳ 1 ) 2 + (Y 2j ȳ 2 ) 2 n n 2 1 Now we have k groups S 2 p = = (Y 1j ȳ 1 ) 2 + (Y 2j ȳ 2 ) 2 n 1 + n 2 2 (y ij ȳ i ) 2 n n n k 1 = SSPE n k df 2 = n k, df 1 = df(res) df 2 = n 2 (n k) = k 2 46 / 49
47 ANOVA ANOVA SS df Regression (Ŷ ij Ȳ ) 2 1 Residual (Y ij Ŷ ij ) 2 n-2 LoF (ŷ ij ȳ i ) 2 k-2 PE (Y ij Ȳ i ) 2 n-k F LOF = SSLof/(k 2) SSPE/(n k) F k 2,n k 47 / 49
48 SSLOF and SSPE SSLOF = y t A 1 y = y t ( H + J )y SSPE = y t A 2 Y = y t (I J )y where 1 J n n J 1 = 0 J n n n k J nk 48 / 49
49 Remedial actions Change model if it appears there is nonlinearity but homogeneity of variance Transform if there is heterogeneity of variance and nonlinearity Consider weighted least squares if there is just heterogeneity of variance Delete outliers Fit a robust model (loess, etc) 49 / 49
Formal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationDiagnostics and Remedial Measures
Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression
More informationContents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationSTAT 4385 Topic 06: Model Diagnostics
STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized
More informationRemedial Measures, Brown-Forsythe test, F test
Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of
More informationDiagnostics and Remedial Measures: An Overview
Diagnostics and Remedial Measures: An Overview Residuals Model diagnostics Graphical techniques Hypothesis testing Remedial measures Transformation Later: more about all this for multiple regression W.
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More informationHomework 2: Simple Linear Regression
STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More informationholding all other predictors constant
Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing
More informationOne-way ANOVA Model Assumptions
One-way ANOVA Model Assumptions STAT:5201 Week 4: Lecture 1 1 / 31 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationSAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c
Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression
More informationStat 427/527: Advanced Data Analysis I
Stat 427/527: Advanced Data Analysis I Review of Chapters 1-4 Sep, 2017 1 / 18 Concepts you need to know/interpret Numerical summaries: measures of center (mean, median, mode) measures of spread (sample
More informationMLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Checking Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix
More informationHeteroskedasticity and Autocorrelation
Lesson 7 Heteroskedasticity and Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 7. Heteroskedasticity
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More information4.1. Introduction: Comparing Means
4. Analysis of Variance (ANOVA) 4.1. Introduction: Comparing Means Consider the problem of testing H 0 : µ 1 = µ 2 against H 1 : µ 1 µ 2 in two independent samples of two different populations of possibly
More informationSimple Linear Regression for the Advertising Data
Revenue 0 10 20 30 40 50 5 10 15 20 25 Pages of Advertising Simple Linear Regression for the Advertising Data What do we do with the data? y i = Revenue of i th Issue x i = Pages of Advertisement in i
More informationMath 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University
Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions
More informationHeteroskedasticity. Part VII. Heteroskedasticity
Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationOutline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013
Topic 20 - Diagnostics and Remedies - Fall 2013 Diagnostics Plots Residual checks Formal Tests Remedial Measures Outline Topic 20 2 General assumptions Overview Normally distributed error terms Independent
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationDiagnostics of Linear Regression
Diagnostics of Linear Regression Junhui Qian October 7, 14 The Objectives After estimating a model, we should always perform diagnostics on the model. In particular, we should check whether the assumptions
More informationRegression Analysis. Regression: Methodology for studying the relationship among two or more variables
Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the
More informationANOVA: Analysis of Variation
ANOVA: Analysis of Variation The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical
More informationEconometrics of Panel Data
Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationunadjusted model for baseline cholesterol 22:31 Monday, April 19,
unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol
More informationRegression diagnostics
Regression diagnostics Leiden University Leiden, 30 April 2018 Outline 1 Error assumptions Introduction Variance Normality 2 Residual vs error Outliers Influential observations Introduction Errors and
More informationTentative solutions TMA4255 Applied Statistics 16 May, 2015
Norwegian University of Science and Technology Department of Mathematical Sciences Page of 9 Tentative solutions TMA455 Applied Statistics 6 May, 05 Problem Manufacturer of fertilizers a) Are these independent
More informationCHAPTER 2 SIMPLE LINEAR REGRESSION
CHAPTER 2 SIMPLE LINEAR REGRESSION 1 Examples: 1. Amherst, MA, annual mean temperatures, 1836 1997 2. Summer mean temperatures in Mount Airy (NC) and Charleston (SC), 1948 1996 Scatterplots outliers? influential
More informationMultiple Regression Analysis: Heteroskedasticity
Multiple Regression Analysis: Heteroskedasticity y = β 0 + β 1 x 1 + β x +... β k x k + u Read chapter 8. EE45 -Chaiyuth Punyasavatsut 1 topics 8.1 Heteroskedasticity and OLS 8. Robust estimation 8.3 Testing
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationDr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)
Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are
More information22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS
ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing
More informationApplied Statistical Methods II. Larry Winner University of Florida Department of Statistics
Applied Statistical Methods II Larry Winner University of Florida Department of Statistics August 21, 2018 2 Chapter 1 Simple Linear Regression 1.1 Introduction Linear regression is used when we have a
More informationLecture 5: Hypothesis tests for more than one sample
1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated
More informationSTA 6167 Exam 1 Spring 2016 PRINT Name
STA 6167 Exam 1 Spring 2016 PRINT Name Unless stated otherwise, for all significance tests, use = 0.05 significance level. Q.1. A regression model was fit, relating estimated cost of de-commissioning oil
More informationDiagnostics for Linear Models With Functional Responses
Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University
More informationApplied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013
Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationSTATISTICS 479 Exam II (100 points)
Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the
More informationSTAT 4385 Topic 03: Simple Linear Regression
STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis
More information18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013
18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are
More informationMathematics for Economics MA course
Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More informationOne-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups
One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationSummary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)
Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ
More informationSTAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)
STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points
More informationAny of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationHandout 4: Simple Linear Regression
Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:
More informationSolutions to Final STAT 421, Fall 2008
Solutions to Final STAT 421, Fall 2008 Fritz Scholz 1. (8) Two treatments A and B were randomly assigned to 8 subjects (4 subjects to each treatment) with the following responses: 0, 1, 3, 6 and 5, 7,
More informationEconometrics of Panel Data
Econometrics of Panel Data Jakub Mućk Meeting # 4 Jakub Mućk Econometrics of Panel Data Meeting # 4 1 / 30 Outline 1 Two-way Error Component Model Fixed effects model Random effects model 2 Non-spherical
More informationSTA2601. Tutorial letter 203/2/2017. Applied Statistics II. Semester 2. Department of Statistics STA2601/203/2/2017. Solutions to Assignment 03
STA60/03//07 Tutorial letter 03//07 Applied Statistics II STA60 Semester Department of Statistics Solutions to Assignment 03 Define tomorrow. university of south africa QUESTION (a) (i) The normal quantile
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationBusiness Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal
Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationAnswer Keys to Homework#10
Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationIntroduction to Linear Regression Rebecca C. Steorts September 15, 2015
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationAssignment 9 Answer Keys
Assignment 9 Answer Keys Problem 1 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean 26.00 + 34.67 + 39.67 + + 49.33 + 42.33 + + 37.67 + + 54.67
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationLecture 9 SLR in Matrix Form
Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationThe ε ij (i.e. the errors or residuals) are normally distributed. This assumption has the least influence on the F test.
Lecture 11 Topic 8: Data Transformations Assumptions of the Analysis of Variance 1. Independence of errors The ε ij (i.e. the errors or residuals) are statistically independent from one another. Failure
More informationModule 6: Model Diagnostics
St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 6: Model Diagnostics 6.1 Introduction............................... 1 6.2 Linear model diagnostics........................
More informationSTA 4210 Practise set 2a
STA 410 Practise set a For all significance tests, use = 0.05 significance level. S.1. A multiple linear regression model is fit, relating household weekly food expenditures (Y, in $100s) to weekly income
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More information1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.
1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationSTAT 571A Advanced Statistical Regression Analysis. Chapter 3 NOTES Diagnostics and Remedial Measures
STAT 571A Advanced Statistical Regression Analysis Chapter 3 NOTES Diagnostics and Remedial Measures 2015 University of Arizona Statistics GIDP. All rights reserved, except where previous rights exist.
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More information