22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)
|
|
- Bryce Montgomery
- 5 years ago
- Views:
Transcription
1 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y Ordinary Least Squares Estimation The classical models we have fit so far with a continuous response Y have been fit using ordinary least squares The model: In matrix notation, we can write this model: Y = Xβ + ɛ with ɛ N n (0, Σ) mean error structure and Σ = σ σ σ σ 2 n n Y i = β 0 + β 1 x 1i + + β k x ki + ɛ i with ɛ i iid N(0, σ 2 ) is fit by minimizing the RSS, ni=1 (Y i Ŷi) 2 = n i=1 (Y i ( ˆβ 0 + ˆβ 1 x 1i + + ˆβ k x ki )) 2 1 or Σ = σ 2 I n n The variance of the vector Y is denoted Σ, or V (Y )=V (Xβ + ɛ) =V (ɛ) =Σ The Σ shows the independence of the observations (off-diagonals are 0) and the constant variance (σ 2 down the entire diagonal) 2 In matrix notation, we can show the Ordinary Least Squares (OLS) estimates for the regression coefficients β as: ˆβ =(X X) 1 X Y where the (X X) 1 represents the inverse of X X when X is of full rank And the estimate for σ 2 is ˆσ 2 = RSS n k 1 But what if the observations are NOT independent, or there is NOT constant variance? (assumptions for OLS) ie what if V (Y ) σ 2 I n n Then, the appropriate estimation method for the regression coefficients may be through Generalized Least Squares Estimation 3 CASE 1: Non-constant variance, but independence holds In this situation we have a similar model Y = Xβ + ɛ with ɛ N n (0, Σ) except Σ = or ɛ i N(0, σ 2 i ) σ σ σn σn 2 n n [different observations i have different variances σi 2] Suppose we can write the variance of Y i as a multiplier of a common variance σ 2 V (Y i )=V (ɛ i )=σ 2 i = ( 1wi ) σ 2 and we say observation i has weight w i 4
2 The weights are inversely proportional to the variance of errors (w i = 1 σi 2 σ 2 ) An observation with a smaller variance has a larger weight Then, V (Y )=Σ and 1 w w Σ = σ w 0 n wn n n = σ 2 W 1 and W is a n n diagonal matrix of weights This special case where Σ is diagonal is very useful, and is known as weighted least squares 5 Weighted Least Squares A special case of Generalized Least Squares Useful when errors have different variance but are all uncorrelated (independent) Assumes that we have some way of knowing the relative variances associated with each observation (or weights) Associates a weight w i with each observation Chooses ˆβ =(ˆβ 0, ˆβ 1,, ˆβ k ) to minimize ni=1 w i [Y i ( ˆβ 0 + ˆβ 1 x 1i + + ˆβ k x ki )] 2 In matrix form the Generalized Least Squares estimates are: ˆβ =(X WX) 1 X WY ni=1 ˆσ 2 w = i (Y i Ŷi) 2 n k 1 Notice the similarity to the OLS form, but now with the W 6 Example situations: 1 If data has been summarized and ith response is the average of n i observations each with constant variance σ 2, then V ar(y i )= σ2 n i and w i = n i 2 If variance is proportional to some predictor x i, then V ar(y i )=x i σ 2 and w i = 1 x i Example: Apple Shoots data Using trees planted in 1933 and 1934, Bland took samples of shoots from apple tress every few days throughout the 1971 growing season (about 106 days) He counted the number of stem units per shoot This measurement was thought to help understand the growth of the tress (fruiting and branching) We do not know the number of stem units for every shoot, but we know the average number of stem units per shoot for all samples on a given day We are interested in modeling the relationship between day of collection (observed) and number of stem units on a sample (not directly observed) 7 8
3 VARIABLES day n y ybar days from dormancy (day of collection) number of shoots collected number of stem units per shoot average number of stem units per shoot for shoots collected on that day (ie y/n) Notice we do not have y, and there are a variety of number of samples taken on a day > applelong day n ybar Plot the relationship between ybar and day ybar day 9 10 If these were individual y observations, we could fit our usual linear model But, some of the observations provide more information on the conditional mean (n i larger), and others provide less information on the conditional mean (n i smaller) If we assume a constant variance σ 2 for the simple linear regression model of y regressed on day, then these ybar observations have a non-constant variance related to n i, with Var(ybar i )= σ2 n i We can fit this model using Weighted Least Squares estimation with w i = n i We ll use our usual lm() function, but include the weights option > lmout=lm(ybar ~ day,weights=n) > summary(lmout) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) <2e-16 *** day <2e-16 *** --- Signif codes: 0 *** 0001 ** 001 * Residual standard error: 1929 on 20 degrees of freedom Multiple R-Squared: 09881,Adjusted R-squared: F-statistic: 1657 on 1 and 20 DF, p-value: < 22e-16 These estimates ˆβ 0 and ˆβ 1 coincide with the simple linear regression model, but we ve accounted for the non-constant variance in our observations And the common ˆσ =
4 If we plot the absolute value of the raw residual e i against the number of observations on the day n i, we see that the observations with higher n i tend to have lower variability: > plot(n,abs(lmout$residuals),ylab="abs(residual)") y y lmout$residuals CASE 2: Non-independence due to Time Correlation When we model the mean structure with ordinary least squares (OLS), the mean structure explains the general trends in the data with respect to our dependent variable and the independent variables abs(residual) n The leftover noise or errors are assumed to have no pattern (we have diagnostic plots to check this) For one thing, the errors are assumed to be independent Suppose observations have been collected over time, and observations taken closer in time are more alike than observations taken further apart in time This is a time-correlation situation And we can see the correlation in the errors by plotting the residuals against time OLS fit Residuals Example: Time as independent variable The following scatterplot shows a positive linear trend in Y with respect to time for Time = 1, 2, 3,, time lmout$fittedvalues There is a pattern in the residuals suggesting residuals near to each other are similar (positively correlated) time Let s look at the ordinary least squares fit 15 If a residual is positive, there s a good chance it s neighboring residual is also positive A lag plot of the residuals gives us information on this residual: e i previous residual in time: e i 1 16
5 Plotting each residual against the previous residual: lmout$residuals Autocorrelation Autoregressive model: model a series in terms of its own past behavior The first-order autoregressive model, AR(1) Y t = β 0 + β 1 x t + ɛ t for t =1,, T lag 1 So, there is positive correlation in the lag residuals The assumption of independence is violated (with respect to the assumption of OLS) We can instead move away from OLS, and incorporate this correlation into our modeling with ɛ t = ρɛ t 1 + u t and u t N(0, σ 2 ) ρ < 1 is the autocorrelation parameter, it tells how strongly the sequential observations are correlated The t th and the (t j) th are also correlated, but not as strongly: corr(ɛ t, ɛ t j )=ρ j A simulation of AR(1) data from n = 50 uniformly spaced time points with a positive linear trend (with β 1 = 2) can bring insight into the AR(1) process: ## Generate x-values: > n=50 > time=1:n ## Assign parameters: > sigma=3 > rho=95 > beta=2 ## Get start point at t=1 for time series ## and allocate space for data vectors: > y=rep(0,n) > e=rep(0,n) > e[1]=rnorm(1,0,sigma) > y[1]=beta*time[1]+e[1] ## Use AR(1) process to sequentially generate y-values: > for (i in 2:n){ e[i]=rho*e[i-1]+rnorm(1,0,sigma) y[i]=beta*time[i]+e[i] } The data for the plots on the previous pages were made from There is also a test for time-correlated errors called the Durbin-Watson test It actually looks for AR(1) errors, and uses H 0 : ρ = 0 vs H A : ρ 0 The test statistic: d = nt=2 (e t e t 1 ) 2 nt=1 e 2 t A small d indicates positive autocorrelation And d = 2 suggests no positive autocorrelation Testing the simulated AR(1) data: > library(car) > durbinwatson(lmout) lag Autocorrelation D-W Statistic p-value Alternative hypothesis: rho!= 0 Reject H 0, there is positive correlation this code 19 20
6 The mean structure in the AR(1) is the same as OLS, but we model the errors differently Y = Xβ + ɛ with ɛ N n (0, Σ) and V (Y )=Σ = κ κρ κρ 2 κρ n 2 κρ n 1 κρ κ κρ κρ n 3 κρ n 2 κρ n 2 κρ n 3 κρ n 4 κ κρ κρ n 1 κρ n 2 κρ n 3 κρ κ and Var(ɛ t )=κ = σ2 1 ρ 2 n n And we again have a Generalized Least Squares Estimation for ˆβ ˆβ =(X Σ 1 X) 1 X Σ 1 Y Notice the similarity to the OLS form, but now with the Σ 1 Example: Daily value of stock The dataset soccho for this example shows the value of 1 unit of CREF Social Choice stock fund on each day of a year starting on 10/21/99 We re interested in fitting a linear model over time But, the independence assumption for OLS is probably violated We can use the Durbin-Watson test to determine whether this is true If so, we will fit an AR(1) model to the data > head(soccho) account unitval date 1 CREFsoci /21/99 2 CREFsoci /22/99 3 CREFsoci /23/99 4 CREFsoci /24/99 5 CREFsoci /25/99 6 CREFsoci /26/ As there are weekend days in the data set, we will first remove these: > n=length(soccho$unitval) > n [1] 365 ## 10/21/99 is a Thursday, get indices for removal: > a1=(seq(1,365,7)+3) > a1=a1[-length(a1)] > a2=(seq(1,365,7)+4) > a2=a2[-length(a2)] > a=sort(c(a1,a2)) ## Subset data down to weekdays: > dayvalues=soccho$unitval[-a] > day=1:length(dayvalues) dayvalues day > length(dayvalues) [1] 261 > plot(day,dayvalues,pch=16) It s pretty apparent that there is time-based correlation in the data, but we will fit a regular linear model assuming independence and then test for correlation over time 23 24
7 > lmout=lm(dayvalues~day) > summary(lmout) Coefficients: Estimate Std Error t value Pr(> t ) (Intercept) <2e-16 *** day <2e-16 *** --- Signif codes: 0 *** 0001 ** 001 * Residual standard error: 1894 on 259 degrees of freedom Multiple R-Squared: 05531,Adjusted R-squared: F-statistic: 3206 on 1 and 259 DF, p-value: < 22e-16 dayvalues A plot of the residuals vs fitted also show the time-based correlation > plot(lmout$fittedvalues,lmout$residuals,pch=16) > abline(h=0) lmout$residuals!6!4! lmout$fittedvalues Residuals that are positive tend to be near other positive residuals, and vice versa for negative residuals day This is more apparent in a lag plot where we plot a residual vs its neighboring residual: e i vs e i 1 > lagplot(lmout$residuals,dolines=false) We can use the Durbin-Watson test to formally test for time dependence (uses the relationship between e i and e i 1 ) > library(car) > durbinwatson(lmout) lmout$residuals!6!4! lag Autocorrelation D-W Statistic p-value Alternative hypothesis: rho!= 0 The test strongly rejects the null of independence (H 0 : ρ = 0) We will fit a first-order autoregressive model to the data, or an AR(1)!6!4! lag 1 There is a positive correlation in the lag residuals (residuals tend to be more like their near neighbors) 27 28
8 Fitting the AR(1) model The gls function [generalized least squares] in the nlme library [non-linear mixed effects] fits regression models with a variety of correlated-error and non-constant errorvariance structures > library(nlme) ## The ~1 below says the data is in order by time > glsout=gls(dayvalues~day, correlation=corar1(form = ~1)) > summary(glsout) Generalized least squares fit by REML Model: dayvalues ~ day Data: NULL AIC BIC loglik Coefficients: Value StdError t-value p-value (Intercept) day Residual standard error: Degrees of freedom: 261 total; 259 residual Day is a significant linear predictor for stock price ˆρ = 09413, and sequential observations are strongly correlated Correlation Structure: AR(1) Formula: ~1 Parameter estimate(s): Phi Comments: 1 When you have many covariates, you can plot the residuals from the OLS fitted model against time as a time-correlation diagnostic If there is time-correlation, this plot will show a pattern rather than a random scatter 2 Including time as a predictor does not necessarily remove time-correlated errors As in the soccho example, time was a predictor in the OLS model, which meant there was a general trend over time, but there was still correlation in the errors after time was included 31
22s:152 Applied Linear Regression. Returning to a continuous response variable Y...
22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y... Ordinary Least Squares Estimation The classical models we have fit so far with a continuous
More informationNon-independence due to Time Correlation (Chapter 14)
Non-independence due to Time Correlation (Chapter 14) When we model the mean structure with ordinary least squares, the mean structure explains the general trends in the data with respect to our dependent
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More information22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial
More informationF9 F10: Autocorrelation
F9 F10: Autocorrelation Feng Li Department of Statistics, Stockholm University Introduction In the classic regression model we assume cov(u i, u j x i, x k ) = E(u i, u j ) = 0 What if we break the assumption?
More information22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction
22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 2: Multiple Linear Regression Introduction Basic idea: we have more than one covariate or predictor for modeling a dependent
More informationExample: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA
s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationAuto correlation 2. Note: In general we can have AR(p) errors which implies p lagged terms in the error structure, i.e.,
1 Motivation Auto correlation 2 Autocorrelation occurs when what happens today has an impact on what happens tomorrow, and perhaps further into the future This is a phenomena mainly found in time-series
More information36-707: Regression Analysis Homework Solutions. Homework 3
36-707: Regression Analysis Homework Solutions Homework 3 Fall 2012 Problem 1 Y i = βx i + ɛ i, i {1, 2,..., n}. (a) Find the LS estimator of β: RSS = Σ n i=1(y i βx i ) 2 RSS β = Σ n i=1( 2X i )(Y i βx
More informationSTAT Regression Methods
STAT 501 - Regression Methods Unit 9 Examples Example 1: Quake Data Let y t = the annual number of worldwide earthquakes with magnitude greater than 7 on the Richter scale for n = 99 years. Figure 1 gives
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationLecture 1 Intro to Spatial and Temporal Data
Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1
More informationTime-Series Regression and Generalized Least Squares in R*
Time-Series Regression and Generalized Least Squares in R* An Appendix to An R Companion to Applied Regression, third edition John Fox & Sanford Weisberg last revision: 2018-09-26 Abstract Generalized
More informationAGEC 621 Lecture 16 David Bessler
AGEC 621 Lecture 16 David Bessler This is a RATS output for the dummy variable problem given in GHJ page 422; the beer expenditure lecture (last time). I do not expect you to know RATS but this will give
More informationMath 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University
Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions
More information1 Graphical method of detecting autocorrelation. 2 Run test to detect autocorrelation
1 Graphical method of detecting autocorrelation Residual plot : A graph of the estimated residuals ˆɛ i against time t is plotted. If successive residuals tend to cluster on one side of the zero line of
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationR Output for Linear Models using functions lm(), gls() & glm()
LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More information22s:152 Applied Linear Regression. 1-way ANOVA visual:
22s:152 Applied Linear Regression 1-way ANOVA visual: Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Y We now consider an analysis
More informationMixed models with correlated measurement errors
Mixed models with correlated measurement errors Rasmus Waagepetersen October 9, 2018 Example from Department of Health Technology 25 subjects where exposed to electric pulses of 11 different durations
More informationDiagnostics of Linear Regression
Diagnostics of Linear Regression Junhui Qian October 7, 14 The Objectives After estimating a model, we should always perform diagnostics on the model. In particular, we should check whether the assumptions
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationCircle the single best answer for each multiple choice question. Your choice should be made clearly.
TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice
More informationQuestions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares
Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares L Magee Fall, 2008 1 Consider a regression model y = Xβ +ɛ, where it is assumed that E(ɛ X) = 0 and E(ɛɛ X) =
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More information10. Time series regression and forecasting
10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the
More informationLecture 24: Weighted and Generalized Least Squares
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)
More informationDealing with Heteroskedasticity
Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing
More informationEconometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague
Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in
More informationLecture 19 Multiple (Linear) Regression
Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression
More information22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 1: Simple Linear Regression Introduction and Estimation
22s:152 Applied Linear Regression Chapter 5: Ordinary Least Squares Regression Part 1: Simple Linear Regression Introduction and Estimation Methods for studying the relationship of two or more quantitative
More informationEco and Bus Forecasting Fall 2016 EXERCISE 2
ECO 5375-701 Prof. Tom Fomby Eco and Bus Forecasting Fall 016 EXERCISE Purpose: To learn how to use the DTDS model to test for the presence or absence of seasonality in time series data and to estimate
More informationSLR output RLS. Refer to slr (code) on the Lecture Page of the class website.
SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association
More informationCorrelated Data: Linear Mixed Models with Random Intercepts
1 Correlated Data: Linear Mixed Models with Random Intercepts Mixed Effects Models This lecture introduces linear mixed effects models. Linear mixed models are a type of regression model, which generalise
More informationSTAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)
STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationThe Problem. Regression With Correlated Errors. Generalized Least Squares. Correlated Errors. Consider the typical regression model.
The Problem Regression With Correlated Errors Consider the typical regression model y t = β z t + x t where x t is a process with covariance function γ(s, t). The matrix formulation is y = Z β + x where
More informationY i = η + ɛ i, i = 1,...,n.
Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics.
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More information1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.
1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects. 1) Identifying serial correlation. Plot Y t versus Y t 1. See
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationEmpirical Economic Research, Part II
Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction
More informationHeteroskedasticity and Autocorrelation
Lesson 7 Heteroskedasticity and Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 7. Heteroskedasticity
More informationLECTURE 10: MORE ON RANDOM PROCESSES
LECTURE 10: MORE ON RANDOM PROCESSES AND SERIAL CORRELATION 2 Classification of random processes (cont d) stationary vs. non-stationary processes stationary = distribution does not change over time more
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationThese slides illustrate a few example R commands that can be useful for the analysis of repeated measures data.
These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data. We focus on the experiment designed to compare the effectiveness of three strength training
More informationCorrelation in Linear Regression
Vrije Universiteit Amsterdam Research Paper Correlation in Linear Regression Author: Yura Perugachi-Diaz Student nr.: 2566305 Supervisor: Dr. Bartek Knapik May 29, 2017 Faculty of Sciences Research Paper
More informationChapter 7: Variances. October 14, In this chapter we consider a variety of extensions to the linear model that allow for more gen-
Chapter 7: Variances October 14, 2018 In this chapter we consider a variety of extensions to the linear model that allow for more gen- eral variance structures than the independent, identically distributed
More informationHeteroskedasticity. Part VII. Heteroskedasticity
Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More information13.2 Example: W, LM and LR Tests
13.2 Example: W, LM and LR Tests Date file = cons99.txt (same data as before) Each column denotes year, nominal household expenditures ( 10 billion yen), household disposable income ( 10 billion yen) and
More informationIntroduction to the Analysis of Hierarchical and Longitudinal Data
Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationMLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project
MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationNonstationary time series models
13 November, 2009 Goals Trends in economic data. Alternative models of time series trends: deterministic trend, and stochastic trend. Comparison of deterministic and stochastic trend models The statistical
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationLecture 3: Multiple Regression
Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u
More informationHomework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.
Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for
More information1 Forecasting House Starts
1396, Time Series, Week 5, Fall 2007 1 In this handout, we will see the application example on chapter 5. We use the same example as illustrated in the textbook and fit the data with several models of
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationOne-way ANOVA (Single-Factor CRD)
One-way ANOVA (Single-Factor CRD) STAT:5201 Week 3: Lecture 3 1 / 23 One-way ANOVA We have already described a completed randomized design (CRD) where treatments are randomly assigned to EUs. There is
More information1 Introduction 1. 2 The Multiple Regression Model 1
Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More information14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationSolution to Series 6
Dr. M. Dettling Applied Series Analysis SS 2014 Solution to Series 6 1. a) > r.bel.lm summary(r.bel.lm) Call: lm(formula = NURSING ~., data = d.beluga) Residuals: Min 1Q
More informationIntroduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017
Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent
More informationIntroduction to Linear Regression Rebecca C. Steorts September 15, 2015
Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationMatrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =
Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write
More information1 Least Squares Estimation - multiple regression.
Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1
More informationHandout 4: Simple Linear Regression
Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 14 1 / 64 Data structure and Model t1 t2 tn i 1st subject y 11 y 12 y 1n1 2nd subject
More informationContents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationMotivation for multiple regression
Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope
More informationEconometrics of Panel Data
Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within
More informationModeling the Covariance
Modeling the Covariance Jamie Monogan University of Georgia February 3, 2016 Jamie Monogan (UGA) Modeling the Covariance February 3, 2016 1 / 16 Objectives By the end of this meeting, participants should
More informationEconometrics of Panel Data
Econometrics of Panel Data Jakub Mućk Meeting # 4 Jakub Mućk Econometrics of Panel Data Meeting # 4 1 / 30 Outline 1 Two-way Error Component Model Fixed effects model Random effects model 2 Non-spherical
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationST505/S697R: Fall Homework 2 Solution.
ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering
More informationFormulary Applied Econometrics
Department of Economics Formulary Applied Econometrics c c Seminar of Statistics University of Fribourg Formulary Applied Econometrics 1 Rescaling With y = cy we have: ˆβ = cˆβ With x = Cx we have: ˆβ
More informationRegression Analysis II
Regression Analysis II Measures of Goodness of fit Two measures of Goodness of fit Measure of the absolute fit of the sample points to the sample regression line Standard error of the estimate An index
More information11.1 Gujarati(2003): Chapter 12
11.1 Gujarati(2003): Chapter 12 Time Series Data 11.2 Time series process of economic variables e.g., GDP, M1, interest rate, echange rate, imports, eports, inflation rate, etc. Realization An observed
More informationChapter 12: Multiple Linear Regression
Chapter 12: Multiple Linear Regression Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 55 Introduction A regression model can be expressed as
More informationEmpirical Market Microstructure Analysis (EMMA)
Empirical Market Microstructure Analysis (EMMA) Lecture 3: Statistical Building Blocks and Econometric Basics Prof. Dr. Michael Stein michael.stein@vwl.uni-freiburg.de Albert-Ludwigs-University of Freiburg
More informationThe Multiple Regression Model Estimation
Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:
More informationLecture 9 STK3100/4100
Lecture 9 STK3100/4100 27. October 2014 Plan for lecture: 1. Linear mixed models cont. Models accounting for time dependencies (Ch. 6.1) 2. Generalized linear mixed models (GLMM, Ch. 13.1-13.3) Examples
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More information