STOR 455 STATISTICAL METHODS I

Similar documents
Lecture 3: Inference in SLR

Topic 14: Inference in Multiple Regression

Overview Scatter Plot Example

Inference for Regression Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Chapter 6 Multiple Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

Lecture 11: Simple Linear Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 1 Linear Regression with One Predictor Variable.p2

Ch 2: Simple Linear Regression

Lecture 10 Multiple Linear Regression

General Linear Model (Chapter 4)

a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF).

6. Multiple Linear Regression

STAT 3A03 Applied Regression With SAS Fall 2017

Statistics 512: Applied Linear Models. Topic 1

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Course Information Text:

Chapter 1 Linear Regression with One Predictor

Statistical Techniques II EXST7015 Simple Linear Regression

Simple Linear Regression

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Lecture 11 Multiple Linear Regression

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

ST505/S697R: Fall Homework 2 Solution.

df=degrees of freedom = n - 1

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Inferences for Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

EXST7015: Estimating tree weights from other morphometric variables Raw data print

ST Correlation and Regression

Ch 3: Multiple Linear Regression

Topic 20: Single Factor Analysis of Variance

Lecture 12 Inference in MLR

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Lecture notes on Regression & SAS example demonstration

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

TMA4255 Applied Statistics V2016 (5)

3 Variables: Cyberloafing Conscientiousness Age

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Section Least Squares Regression

Inference for the Regression Coefficient

Correlation Analysis

SIMPLE LINEAR REGRESSION

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 8 Quantitative and Qualitative Predictors

Lecture 13 Extra Sums of Squares

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

ECO220Y Simple Regression: Testing the Slope

Regression Models - Introduction

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Lecture 9 SLR in Matrix Form

Biostatistics 380 Multiple Regression 1. Multiple Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Topic 7: Analysis of Variance

A discussion on multiple regression models

Topic 28: Unequal Replication in Two-Way ANOVA

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Models for Clustered Data

Models for Clustered Data

Concordia University (5+5)Q 1.

Applied Regression Analysis

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

14 Multiple Linear Regression

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Confidence Interval for the mean response

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

ANOVA (Analysis of Variance) output RLS 11/20/2016

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Variance Decomposition and Goodness of Fit

One-Way Analysis of Variance (ANOVA) There are two key differences regarding the explanatory variable X.

The simple linear regression model discussed in Chapter 13 was written as

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Failure Time of System due to the Hot Electron Effect

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Lecture 6 Multiple Linear Regression, cont.

Lecture 3. Experiments with a Single Factor: ANOVA Montgomery 3-1 through 3-3

Statistical Modelling in Stata 5: Linear Models

Outline Topic 21 - Two Factor ANOVA

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Basic Business Statistics 6 th Edition

Transcription:

STOR 455 STATISTICAL METHODS I Jan Hannig

Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted responses Ŷ=H Y, H=X (X X) - 1 X Residuals e=(i- H)Y Es9mated regression variance s 2 =e e/(n- p)

Board Example Simple linear regression

Residual Analysis (Sec9on 4.5) Recall H=X(X X) - 1 X - matrix H=(h ij ) Standardized residuals r i = Similarly as with SLR we should look at Plot of r vs predictors X i (p plots) Plot of r vs predicted values Ŷ Gaussian QQ- Plot of r e i s 1 h ii

Do It in SAS *Data shown on page 237 of the OPTIONAL textbook - file CH06FI05.txt; data studios; input x1 x2 y; x1x2=x1*x2; label x1='targtpop' x2='dispoinc'; cards; 68.5 16.7 174.4 45.2 16.8 164.4 91.3 18.2 244.2... 52.3 16.0 166.5 ; run;

Do it in SAS * plot residuals, QQ plot; proc reg data = studios noprint; model y = x1 x2; plot student. * (x1 x2 p.) ; plot student. * nqq.; run; *Alternative way of plotting; proc reg data = studios; model y = x1 x2; output out=output p = fitted student = residual; run; proc gplot data = output; plot residual*fitted; plot residual*x1; plot residual*x2; plot residual*x1x2; run; proc univariate data = output noprint ; qqplot residual / normal; run; *End of an alternative way of plotting;

Do it in SAS

Inference for b (Sec9on 4.6+4.7) b ~ N(β, σ 2 (X X) 1 ) Es9mate Var(b i ) by s 2 (b i ) =MSE* (X X) 1 i,i CI: b i ± t * s(b i ) Significance test for H 0i : β i, = 0 uses the test sta9s9c t =b i /s(b i ), df=dfe=n- p, and the P- value computed from the t(n- p) distribu9on Studios example (proc reg, op9on /clb)

Do it in SAS proc reg data=studios; model y=x1 x2/clb clm cli alpha=0.01; /*clb CI for b; clm CI for mean; cli CI for prediction; */ model y=x1 x2/xpx i covb corrb; /* xpx gives the X'X matrix i gives the inverse of the X'X matrix covb gives the variance covariance matrix of b corrb gives the correlation matrix of b*/ run;

Do it in SAS The REG Procedure Model: MODEL1 Dependent Variable: y Output Statistics Dependent Predicted Std Error Obs Variable Value Mean Predict 99% CL Mean 99% CL Predict Residual 1 174.4000 187.1841 3.8409 176.1283 198.2400 153.6265 220.7418-12.7841 2 164.4000 154.2294 3.5558 143.9944 164.4645 120.9332 187.5257 10.1706 3 244.2000 234.3963 4.5882 221.1895 247.6032 200.0699 268.7228 9.8037 4 154.6000 153.3285 3.2331 144.0223 162.6347 120.3060 186.3511 1.2715 5 181.6000 161.3849 4.4300 148.6334 174.1365 127.2311 195.5388 20.2151... 20 224.1000 230.3161 5.8120 213.5865 247.0457 194.4864 266.1457-6.2161 21 166.5000 157.0644 4.0792 145.3228 168.8060 123.2746 190.8542 9.4356 Sum of Residuals 0 Sum of Squared Residuals 2180.92741 Predicted Residual SS (PRESS) 3002.92331

Do it in SAS Model Crossproducts X'X X'Y Y'Y Variable Label Intercept x1 x2 y Intercept Intercept 21 1302.4 360 3820 x1 targtpop 1302.4 87707.94 22609.19 249643.35 x2 dispoinc 360 22609.19 6190.26 66072.75 y 3820 249643.35 66072.75 721072.4 X'X Inverse, Parameter Estimates, and SSE Variable Label Intercept x1 x2 y Intercept Intercept 29.728923483 0.0721834719-1.992553186-68.85707315 x1 targtpop 0.0721834719 0.0003701761-0.005549917 1.4545595828 x2 dispoinc -1.992553186-0.005549917 0.1363106368 9.3655003765 y -68.85707315 1.4545595828 9.3655003765 2180.9274114 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 24015 12008 99.10 <.0001 Error 18 2180.92741 121.16263 Corrected Total 20 26196 Root MSE 11.00739 R-Square 0.9167 Dependent Mean 181.90476 Adj R-Sq 0.9075 Coeff Var 6.05118

Do it in SAS Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > t Intercept Intercept 1-68.85707 60.01695-1.15 0.2663 x1 targtpop 1 1.45456 0.21178 6.87 <.0001 x2 dispoinc 1 9.36550 4.06396 2.30 0.0333 Covariance of Estimates Variable Label Intercept x1 x2 Intercept Intercept 3602.0346743 8.7459395806-241.4229923 x1 targtpop 8.7459395806 0.0448515096-0.672442604 x2 dispoinc -241.4229923-0.672442604 16.515755794 The SAS System 22:26 Monday, October 24, 2005 7 The REG Procedure Model: MODEL2 Dependent Variable: y Correlation of Estimates Variable Label Intercept x1 x2 Intercept Intercept 1.0000 0.6881-0.9898 x1 targtpop 0.6881 1.0000-0.7813 x2 dispoinc -0.9898-0.7813 1.0000

ANOVA (Sec9on 4.8) Sources of varia9on are Model or Regression (SSM or SSR) Error or Residual (SSE) Total (SSTO) SS and df add SSM + SSE =SSTO dfm + dfe = dft

Sum of Squares

Degree of Freedom and Mean Squares df M =p- 1 df E =n- p df T =n- 1 MSM=SSM/df M MSE=SSE/df E MST=SST/df T

ANOVA Table Source SS df MS F Model SSM dfm MSM MSM/MSE Error SSE dfe MSE Total SST dft (MST)

Do it in SAS The REG Procedure Model: MODEL1 Dependent Variable: y Number of Observations Read 21 Number of Observations Used 21 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 24015 12008 99.10 <.0001 Error 18 2180.92741 121.16263 Corrected Total 20 26196 Root MSE 11.00739 R-Square 0.9167 Dependent Mean 181.90476 Adj R-Sq 0.9075 Coeff Var 6.05118

ANOVA F test H 0 : β 1 = β 2 = β p- 1 = 0 H a : β k neq 0, for at least one k=1,, p- 1 Under H 0, F ~ F(p- 1,n- p) Reject H 0 if F is large, use P value Studios example: see SAS output

Interpret F- test The p- value for the F significance test tells us one of the following: p- value large: there is no evidence to conclude that any of our explanatory variables can help us to model the response variable using this kind of model P- value small: one or more of the explanatory variables in our model is poten9ally useful for predic9ng the response variable in a linear model