Q Lecture Introduction to Regression

Similar documents
Introduction to Regression

Confidence Interval for the mean response

Models with qualitative explanatory variables p216

School of Mathematical Sciences. Question 1

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

1. Least squares with more than one predictor

Simple Linear Regression: A Model for the Mean. Chap 7

Multiple Regression Examples

School of Mathematical Sciences. Question 1. Best Subsets Regression

Chapter 12: Multiple Regression

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

SMAM 314 Exam 42 Name

General Linear Model (Chapter 4)

Interpreting the coefficients

Mathematical Notation Math Introduction to Applied Statistics

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Model Building Chap 5 p251

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

10 Model Checking and Regression Diagnostics

Topic 14: Inference in Multiple Regression

Lecture 3: Inference in SLR

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

Correlation & Simple Regression

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Is economic freedom related to economic growth?

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

STAB27-Winter Term test February 18,2006. There are 14 pages including this page. Please check to see you have all the pages.

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

Business 320, Fall 1999, Final

Multiple Linear Regression

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Ch 13 & 14 - Regression Analysis

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

Introduction to Regression

INFERENCE FOR REGRESSION

23. Inference for regression

9 Correlation and Regression

(1) The explanatory or predictor variables may be qualitative. (We ll focus on examples where this is the case.)

Analysis of Bivariate Data

Six Sigma Black Belt Study Guides

STATISTICS 110/201 PRACTICE FINAL EXAM

Business Statistics. Lecture 10: Course Review

SMAM 314 Practice Final Examination Winter 2003

Ph.D. Preliminary Examination Statistics June 2, 2014

1 Introduction to Minitab

28. SIMPLE LINEAR REGRESSION III

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

22S39: Class Notes / November 14, 2000 back to start 1

MULTIPLE LINEAR REGRESSION IN MINITAB

Lecture 1 Linear Regression with One Predictor Variable.p2

Final Exam Bus 320 Spring 2000 Russell

Mathematical Notation Math Introduction to Applied Statistics

Lecture 18: Simple Linear Regression

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

STAT 212 Business Statistics II 1

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

10. Alternative case influence statistics

Inference for Regression Inference about the Regression Model and Using the Regression Line

Difference in two or more average scores in different groups

The simple linear regression model discussed in Chapter 13 was written as

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

Orthogonal contrasts for a 2x2 factorial design Example p130

OHSU OGI Class ECE-580-DOE :Design of Experiments Steve Brainerd

Concordia University (5+5)Q 1.

Multiple Regression Methods

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

In Class Review Exercises Vartanian: SW 540

STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room:

III. Inferential Tools

Stat 231 Final Exam. Consider first only the measurements made on housing number 1.

Answer Keys to Homework#10

Assignment 9 Answer Keys

Lecture 10: F -Tests, ANOVA and R 2

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Chapter 14. Simple Linear Regression Preliminary Remarks The Simple Linear Regression Model

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Business Statistics. Lecture 9: Simple Regression

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Applied Regression Analysis. Section 2: Multiple Linear Regression

ACOVA and Interactions

Inference for the Regression Coefficient

Quantitative Techniques - Lecture 8: Estimation

EXAM IN TMA4255 EXPERIMENTAL DESIGN AND APPLIED STATISTICAL METHODS

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

AP Statistics Cumulative AP Exam Study Guide

Basic Business Statistics, 10/e

STAT 3A03 Applied Regression With SAS Fall 2017

Review of Multiple Regression

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

CRP 272 Introduction To Regression Analysis

Transcription:

Q3 2009 1

Before/After Transformation 2

Construction Role of T-ratios Formally, even under Null Hyp: H : 0, ˆ, being computed from k t k SE ˆ ˆ y values themselves containing random error, will sometimes be large ( ) for reasons of random chance only. Also t-ratios k k 0 k In (conceptual) replications, differing from current data by chance alone, the probability of obtaining, by chance, a t-ratio larger ( ) than quoted value is quoted P value 3

Role of T-ratios ˆ ˆ Informally, if t is not large (>2 in mag; p - 0.05) then coeff of x can be given a value of zero - equivalently x can be dropped from model - k k with little appreciable impact. t k k SE k k Cautio n: applies one variable at a time 4

Residuals Regression Analysis: Taste versus Lactic Acid, LAcetic, LH2S Taste = - 28.9 + 19.7 Lactic Acid + 0.33 LAcetic + 3.91 LH2S Predictor Coef SE Coef T P Constant -28.87 19.74-1.46 0.156 Lactic Acid 19.670 8.629 2.28 0.031 LAcetic 0.327 4.461 0.07 0.942 LH2S 3.912 1.249 3.13 0.004 S = 10.1307 R-Sq = 65.2% R-Sq(adj) = 61.2% Unusual Observations Lactic Obs Acid Taste Fit SE Fit Residual St Resid 15 1.52 54.90 29.45 3.04 25.45 2.63R Analysis of Variance Source DF SS MS F P Regression 3 4994.5 1664.8 16.22 0.000 Residual Error 26 2668.4 102.6 Total 29 7662.9 Source DF Seq SS Lactic Acid 1 3800.4 LAcetic 1 186.5 LH2S 1 1007.6 5

Residuals Unusual Observations Lactic Obs Acid Taste Fit SE Fit Residual St Resid 15 1.52 54.90 29.45 3.04 25.45 2.63R In fact barely outlying despite 2.63 Recall, one has to be largest!! 6

Options with Large Residuals Examine carefully: Why outlying? Anything special about this case/obs? Refit without does its removal change anything important? If delete, then formally Conclusions are based on something like this never happening in future Is this a meaningful statement? 7

Residuals, Standardized Residuals, Deleted T-residuals Normal Scores 8

Classic Linear Model Y x x... x Transformations i 1 1i 2 2i p pi i 2 where ~ N 0, Var or N 0, SD i Statistical Theory assumes Normally Dist residuals/'errors' makes NO assumptions re dist of Y, X,.. 1 (technical) makes assumption of additivity (crucial) But non-additivity - esp multiplicative models - and non normality often occur together 9

Normality of data or errors/resids? 10

Extreme example 11

Random Variation Additive? 6.00 Exp Decay. Random variation decreases with time 10.00 Exp Decay on log scale. Random variation constant in time 5.00 4.00 3.00 2.00 1.00 0.10 0 10 20 30 40 50 60 1.00 0.00 0 10 20 30 40 50 60 time t 0.01 0.00 time t 12

Artificially created data Rescaled Additive model to create data Resids by Exponentiation of line and data t line erors Y = line + e subtraction line Exp y 1 7.9-0.26 7.64-0.26 0.579 0.315 2 7.8-0.12 7.68-0.12 0.460 0.350 3 7.7 0.27 7.97 0.27 0.365 0.687 4 7.6-0.50 7.10-0.50 0.290 0.091 5 7.5-0.67 6.83-0.67 0.231 0.050 6 7.4-0.16 7.24-0.16 0.183 0.127 Data created multiplicatively exhibit neither linearity nor Normality in either data or residuals Log transform solves both issues 13

Artificially created data Rescaled Additive model to create data Resids by constant Exponentiation by construction of line and data t line erors Y = line + e subtraction 2.5 line Exp y 1 7.9-0.26 7.64-0.26 2 0.579 0.315 1.5 2 7.8-0.12 7.68-0.12 0.460 0.350 1 3 7.7 0.27 7.97 0.27 0.365 0.687 0.5 4 7.6-0.50 7.10-0.50 0.290 0.091 0 5 7.5-0.67 6.83-0.67-0.5 0 10 200.231 30 40 0.050 60 6 7.4-0.16 7.24-0.16 0.183 0.127-1 Linear Decay Rand variation 14

Artificially created data Rescaled Additive model to create data Resids by Exponentiation of line and data t line erors Y = line + e subtraction line Exp y 1 7.9-0.26 7.64-0.26 0.579 0.315 2 7.8-0.12 7.68-0.12 0.460 0.350 3 7.7 0.27 7.97 0.27 0.365 0.687 4 7.6-0.50 7.10-0.50 0.290 0.091 5 7.5-0.67 6.83-0.67 0.231 0.050 6 7.4-0.16 7.24-0.16 0.183 0.127 6.00 5.00 4.00 3.00 2.00 1.00 0.00 Exp Decay. Random variation decreases with time 0 10 20 30 40 50 60 time t 10.00 1.00 0.10 0.01 Exp Decay on log scale. Random variation now seems constant in time 0 10 20 30 40 50 60 time t 15

Artificially created data Distributions of obs Y under both models Additive model to create dat Exponentiation of line and data t line Y = line + e line Exp y 1 0.9 2.64 2.46 14.05 2 0.8 0.51 2.23 1.66 3 0.7 0.65 2.01 1.92 4 0.6 0.96 1.82 2.61 5 0.5 0.76 1.65 2.14 6 0.4 0.24 1.49 1.28 7 0.3-1.09 1.35 0.34 8 0.2-0.26 1.22 0.77 9 0.1 0.73 1.11 2.07 16

Resids for artificially created data Rescaled Residuals Exponentiation of line and data Resids by Resids by line Exp y subtraction division 0.281 0.412 0.13 1.47 0.223 0.125-0.10 0.56 0.177 0.146-0.03 0.82 0.141 0.121-0.02 0.86 0.112 0.091-0.02 0.81 0.089 0.052-0.04 0.58 0.071 0.073 0.00 1.04 Erroneous Not reflecting creation 17

Artificially created data Exponentiation of line and data Resids by Resids by line Exp y subtraction division 2.46 14.05 11.59 5.71 2.23 1.66-0.56 0.75 2.01 1.92-0.10 0.95 1.82 2.61 0.78 1.43 1.65 2.14 0.49 1.30 1.49 1.28-0.21 0.86 1.35 0.34-1.01 0.25 1.22 0.77-0.45 0.63 1.11 2.07 0.97 1.87 1.00 1.75 0.75 1.75 Now reflecting creation, but not Normal 18

Artificially created data Exponentiation of line and data Resids by Resids by line Exp y subtraction division 2.46 14.05 11.59 5.71 2.23 1.66-0.56 0.75 2.01 1.92-0.10 0.95 1.82 2.61 0.78 1.43 1.65 2.14 0.49 1.30 1.49 1.28-0.21 0.86 1.35 0.34-1.01 0.25 1.22 0.77-0.45 0.63 1.11 2.07 0.97 1.87 1.00 1.75 0.75 1.75 Now reflecting creation, but not Normal. However Normal after log transformation 19

Artificially created data Exponentiation of line and data Resids by Resids by line Exp y subtraction division 2.46 14.05 11.59 5.71 2.23 1.66-0.56 0.75 2.01 1.92-0.10 0.95 1.82 2.61 0.78 1.43 1.65 2.14 0.49 1.30 1.49 1.28-0.21 0.86 1.35 0.34-1.01 0.25 1.22 0.77-0.45 0.63 1.11 2.07 0.97 1.87 1.00 1.75 0.75 1.75 Now reflecting creation, but not Normal. However Normal after log transformation 20

Plotting Plotting Multiple Regression Fits gpm = - 0.402 + 1.47 wt + 0.0240 hp/wt a b1 b2 Given hp/wt -0.402 1.47 0.024 93.838 hp/wt wt line 41.985 2.620 5.702 38.261 2.875 6.076 40.086 2.320 5.261 34.215 3.215 6.576 50.872 3.440 6.907 30.347 3.460 6.936 68.627 3.570 7.098 19.436 3.190 6.539 30.159 3.150 6.481 35.756 3.440 6.907 35.756 3.440 6.907 44.226 4.070 7.833 48.257 3.730 7.333 47.619 3.780 7.407 39.048 5.250 9.568 39.639 5.424 9.823 43.031 5.345 9.707 30.000 2.200 5.084 32.198 1.615 4.224 35.422 1.835 4.548 39.351 2.465 5.474 42.614 3.520 7.025 43.668 3.435 6.900 63.802 3.840 7.495 45.514 3.845 7.502 13 12 11 10 9 8 7 6 5 4 3 Fitted line vs wt when hp/wt=41.98 0.0 1.0 2.0 3.0 4.0 5.0 6.0 wt Excel plotting 21

Very active area of advanced research Exam Q will not ask for undiscovered network Direct/Indirect Links Causal Networks Networks Network Models No primary response variable Now can consider Regress Directed Arrows Regression has no useful role Undirected Arrows Regression has some role? Y= wt on X s hp, hp/wt, gpm hp hp/ wt wt gpm 22

Undirected Network Models Indicative research methodology hp hp/ wt gpm Regression Analysis: wt versus hp, hp/wt, gpm wt = 2.34 + 0.0159 hp - 0.0530 hp/wt + 0.175 gpm T-ratios 7.55-8.88 2.99 Deleted stuff... wt Tentative network model Source DF Seq SS hp 1 12.8791 hp/wt 1 14.6780 gpm 1 0.5122 Note order gpm adds relatively little to pred wt when hp and hp/wt already in the model Relatively weak evidence in favour of Can drop link But propose new link hp NB changing the response variable hp/ wt wt gpm Proposed new network model 23

Undirected Network Models Direct/Indirect Links Key: No primary response variable Network Order hp hp/ wt wt gpm Recall regression fit completely insensitive to order Both following make the same predictions Regression not the natural tool but is relevant wt = 2.34 + 0.0159 hp - 0.0530 hp/wt + 0.175 gpm wt = 2.34-0.0530 hp/wt + 0.175 gpm + 0.0159 hp 24

Not on the exam in 2010 ANOVA and F -tables t-tables PIs and CIs 25