Lecture 18 MA Applied Statistics II D 2004

Similar documents
Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Linear Regression

Homework 2: Simple Linear Regression

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Lecture 10 Multiple Linear Regression

Lecture 11 Multiple Linear Regression

F-tests and Nested Models

Chapter 6 Multiple Regression

Linear models and their mathematical foundations: Simple linear regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Multiple Linear Regression for the Supervisor Data

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Inferences for Regression

Ch 2: Simple Linear Regression

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Intro to Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Intro to Linear Regression

3. Diagnostics and Remedial Measures

Chapter 12 - Lecture 2 Inferences about regression coefficient

6. Multiple Linear Regression

ECON3150/4150 Spring 2016

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

ECON3150/4150 Spring 2015

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

where x and ȳ are the sample means of x 1,, x n

ST Correlation and Regression

Ch. 1: Data and Distributions

Linear Regression Model. Badr Missaoui

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Inference for Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Inference for Regression Simple Linear Regression

Simple Linear Regression: One Quantitative IV

Correlation Analysis

Simple Linear Regression

Multivariate Regression (Chapter 10)

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Lectures on Simple Linear Regression Stat 431, Summer 2012

Simple linear regression

Chapter 3. Diagnostics and Remedial Measures

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Formal Statement of Simple Linear Regression Model

Chapter 2 Inferences in Simple Linear Regression

Lecture 3: Inference in SLR

Chapter 11 - Lecture 1 Single Factor ANOVA

Solution: X = , Y = = = = =

Lecture 9 SLR in Matrix Form

Biostatistics 380 Multiple Regression 1. Multiple Regression

STAT 4385 Topic 03: Simple Linear Regression

HOMEWORK (due Wed, Jan 23): Chapter 3: #42, 48, 74

Chapter 1. Linear Regression with One Predictor Variable

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Recall, Positive/Negative Association:

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Math 3330: Solution to midterm Exam

4 Multiple Linear Regression

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

STAT5044: Regression and Anova

Unit 5: Regression. Marius Ionescu 09/22/2011

Simple Linear Regression

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Analysis of Bivariate Data

Simple Linear Regression

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Multiple Regression Examples

Applied Regression Analysis

Lecture 1 Linear Regression with One Predictor Variable.p2

Topic 20: Single Factor Analysis of Variance

Applied Statistics and Econometrics

Statistics for Managers using Microsoft Excel 6 th Edition

Chapter 4: Regression Models

Correlation and Linear Regression

STAT 540: Data Analysis and Regression

This document contains 3 sets of practice problems.

General Linear Models. with General Linear Hypothesis Tests and Likelihood Ratio Tests

Lecture 6 Multiple Linear Regression, cont.

Inference for the Regression Coefficient

Regression Analysis IV... More MLR and Model Building

Basic Business Statistics 6 th Edition

Ch 3: Multiple Linear Regression

Categorical Predictor Variables

Review of Statistics 101

Weighted Least Squares

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Simple Linear Regression

Matrix Approach to Simple Linear Regression: An Overview

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

Multiple Linear Regression

Key Algebraic Results in Linear Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

9. Linear Regression and Correlation

STOR 455 STATISTICAL METHODS I

Simple Linear Regression: One Qualitative IV

Transcription:

Lecture 18 MA 2612 - Applied Statistics II D 2004 Today 1. Examples of multiple linear regression 2. The modeling process (PNC 8.4) 3. The graphical exploration of multivariable data (PNC 8.5) 4. Fitting the multiple linear regression model (PNC 8.6) 5. ANOVA for the MLR model (PNC 8.11) Examples of multiple linear regression In Lecture 17 we saw the mathematical formulation for the multiple linear regression (MLR) model, but did not give any examples of real-life situations where the MLR model can be employed. 1. Modeling tree volume as a function of tree diameter and tree height Data measured on 31 black cherry trees in the Allegheny National Forest in Pennsylvania are used to build a model for predicting tree volume using only measurements of the height and diameter of a tree. Z 1 Z 2 = tree volume in cubic feet = tree height in feet = tree diameter measured 4.5 feet above ground level in inches We might consider the following models. = β 0 + β 1 Z 1 + β 2 Z 2 + = β 0 + β 1 Z 1 + β 2 Z 2 + β 3 Z 1 Z 2 + = β 0 + β 1 Z 1 Z2 2 + = β 0 + β 1 Z 1 Z2 2 + β 2 Z 1 +

2. Modeling college GPA as a function of high school grades A university interested in predicting whether or not applicants will be successful college students modeled cumulative GPA after three semesters as a function of some of the information provided by the students on their application for admission. Z 1 Z 2 Z 3 Z 4 Z 5 = cumulative GPA after three semesters on a 4.0 scale = high school grades in mathematics courses coded on a 10.0 scale = high school grades in science courses coded on a 10.0 scale = high school grades in English courses coded on a 10.0 scale = score on quantitative section of the SAT out of 800 points = score on verbal section of the SAT out of 800 points 3. Modeling battery failure Satellite applications motivated the development of a silver-zinc battery. Time until battery failure was modeled as a function of a number of design and storage parameters in order to choose parameter values that would yield batteries with maximum longevity. Z 1 Z 2 Z 3 Z 4 Z 5 = Cycles to failure = Charge rate = Discharge rate = Depth of discharge = Temperature = End of charge voltage Aside: Note that there is a difference between multiple linear regression and multivariate linear regression. Multiple linear regression: One response and multiple predictors Multivariate linear regression: Multiple responses and one or more predictors 2

The modeling process Having seen these examples of where the MLR model has been employed, onecanseethevalueofthemlrmodel. Recall: The purpose of the MLR model is to... 1. describe the relation between the response and the predictors 2. make predictions about the response for given values of the predictors But given a response variable and a set of predictor variables how does one go about building an appropriate model? Model building is an iterative process. 1. Model specification - Specifying the number and form of the regressor variables. 2. Model fitting - Estimating the model parameters β 0,β 1,...,β q and σ 2. 3. Model assessment Using the residuals and other means to evaluate how well the model describes the relationship between our predictor variables and the response. If we determine that the model does not describe the data well, we return to step 1 and specify a new model. 4. Model validation - Testing the model on another set of data. In the remainder of the course, we will study methods associated with each of these steps. 3

The graphical exploration of multivariable data (PNC 8.5) In Lab 5, using SAS/INSIGHT, you will employ the following techniques to study the relationship between predictor variables and a response variable as well as relationships among the predictor variables themselves. 1. Scatterplot arrays allow one to study the relationship between many pairs of variables simultaneously. 2. Rotating 3-D plots allow one to visualize the relationship between three variables in three dimensions. Graphical exploration of relationships between predictors and the response are an important preliminary step in the model building process. Before we can specify a model we need to understand how the variables relate to one another. What we learn about the variables through this exploration will help us to specify appropriate regression models. 4

Fitting the multiple linear regression model (PNC 8.6) We estimate the parameters of the model, β 0,β 1,...,β q, using the least squares method discussed in Chapters 7 and 9. That is to say, we choose the estimates, b 0,b 1,...,b q, in such a way that we minimize the sum of squared error terms. nx SSE(b 0,b 1,...,b q )= [ i (b 0 + b 1 X i1 + b 2 X i2 + b q X iq )] 2 i=1 AsinthecaseoftheSLRmodel... We denote the estimates for β 0,β 1,...,β q by ˆβ 0, ˆβ 1,...,ˆβ q and write the fitted regression equation Ŷ = ˆβ 0 + ˆβ 1 X 1 + ˆβ 2 X 2 + + ˆβ q X q The residuals are the observed error terms, the difference between the observed and fitted values for each observation. e i = i Ŷi = i ( ˆβ 0 + ˆβ 1 X i1 + ˆβ 2 X i2 + + ˆβ q X iq ) We estimate the error term variance σ 2 by ˆσ 2 = 1 n q 1 nx (e i ē) 2 = i=1 1 n q 1 nx e 2 i = i=1 SSE n q 1 = SSE df E = MSE Unlike the SLR model... There is no simple expression for the least squares estimates ˆβ 0, ˆβ 1,...,ˆβ q in terms of the observed data (without using matrix notation). However, SAS will still compute the parameter estimates. 5

ANOVA for the MLR model (PNC 8.11) Source df SS MS Regression q SSR = P n i=1 (Ŷi Ȳ )2 MSR = SSR/df R Error n q 1 SSE = P n i=1 ( i Ŷ i ) 2 MSE = SSE/df E Total n 1 SSTO = P n i=1 ( i Ȳ ) 2 Note that as was the case in Chapters 7 and 9 df T = df R + df E SSTO = SSR + SSE 6