We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Size: px
Start display at page:

Download "We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model."

Transcription

1 Statistical Methods in Business Lecture 5. Linear Regression We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model. Let there be k number of possible causes, such that the value of a possible cause is denoted by X ", the index j = 1,2,3,, k points at a particular possible cause. Linear Regression Model: Assume Y = β 0 + β 1 X ' + β 2 X ( + + β k X ) + Ɛ. Ɛ ~ Ɲ (0, σ ( ). Here, Y denotes the value of the response which is generated by a linear combination of all possible causes.x " /, and other factors that are contributing to the response behavior, assumed to be independent of the cause factor, denoted by the independent random variable Ɛ. Also, we assumed that this random factor behaves according to a normal (a.k.a. Gaussian) probability law, it does not contribute anything to the average response behavior in the long run, yet it generates a constant variance for the response for all possible combinations of the cause values. We wish to collect independent observations and construct a regression equation for the collected data set. Assume that we have finitely many independent observations, say n number of observations, in our data set. Each observation informs us about the values of each possible cause and their response: Observation (X ' X ( X ) Y ) (values) 1 st (X '' X '( X ') Y ' ) 2 nd (X (' X (( X () Y ( ) n th (X 2' X 2( X 2) Y 2 ) DATA SET 1 P age

2 Now, we can fit the model with our data set and produce n number of equations: Y ' = β 0 + β 1 X '' + β 2 X '( + + β k X ') + Ɛ ' Y ( = β 0 + β 1 X (' + β 2 X (( + + β k X () + Ɛ ( Y 3 = β 0 + β 1 X 3' + β 2 X 3( + + β k X 3) + Ɛ 3 Y 2 = β 0 + β 1 X 2' + β 2 X 2( + + β k X 2) + Ɛ 2 Here, we know all the Y 4 and X 4" values, for i = 1,2,, n and j = 1,2,, k, from our data set recorded values. But, β j values for j = 0,1,2,, k are (assumed) unknown constant numbers. Therefore, we will search for the best possible numerical values to estimate the unknown β j values, by using our data set information. Now, we can represent the MODEL AND DATA SET INFORMATION FITTED TOGETHER with a simple algebraic notation. Let y = Y ' Y ( Y 2 be the column listing of all the recorded values of the response on the data set. This is a column vector of the recorded responses for each observation and each coordinate gives this information. We call y as the response vector. Similarly, let X = 1 X '' X '( X ') 1 X (' X (( X () 1 X 3' X 3( X 3) 1 X 2' X 2( X 2) be the writing of all observed values of the possible causes in the data set, such that each row (line) starts with a constant one and followed by the ordered list of numerical values of possible causes in each observation. This matrix of possible cause values, X, is known as the design matrix. 2 P age

3 Also, let β = β H β ' be the column listing of all unknown parameters, known as the parameter vector. β ( β ) Also, let Ɛ = Ɛ ' Ɛ ( Ɛ 2 be the column listing of all error terms, known as the error vector Thus, we can represent these MODEL AND DATA FITTED TOGETHER, n equations as y = X β + Ɛ. This linear algebraic equation can be easily utilized in order to search and find a solution for the unknown β. Our objective is to find the best possible numerical values for β such that the estimated β will minimize the prediction error. The conditions for minimizing the prediction error is the predicted response vector, y8, is an orthogonal projection of the unknown response vector, y, on to the tangent plane of the space generated by the possible causes information. Thus the magnitude of the prediction error between the actual response vector and the predicted response vector is minimized, i.e. X Ɛ or, X : Ɛ = 0. Since y = X β +Ɛ, we have Ɛ = y - X β. Thus, the minimum prediction error condition yields the best possible numerical estimated values for β, such that X : (y X β?) = 0. X : y X : X β? = 0. X : y = X : X β?. (X : X) is a square, positive definite, (k+1) x (k+1) matrix, therefore it has an inverse, (X : X) C'. Consequently, (X : X) C' (X : y) = (X : X) C' (X : X) β?. Since (X : X) C' (X : X)= I ()F')G ()F'), identity matrix (X : X) C' (X : y) = β?. Therefore, the best estimated parameter vector that minimizes the magnitude of the prediction error is β?* = (X : X) C' X : y. 3 P age

4 Let β?* = b H b ' b ( b ) Now, we can construct the regression equation for the data set, by using these estimated parameter values; YJ = b K + b ' X ' + b ( X ( + + b ) X ). Thus, we can estimate the response values by using the data set recorded possible cause values, such as: Y ' L = b K + b ' X '' + b ( X '( + + b ) X '). Y L ( = b K + b ' X (' + b ( X (( + + b ) X (). Y 2 M = b K + b ' X 2' + b ( X 2( + + b ) X 2). Or, better yet, we can write a column list of these estimated response values by the data set recorded possible cause values; y8 = and thus we have y8 = X β?* Y L. ' Y L ( Y 2 M If we look at the differences between what is observed and what is estimated for the response, we generate residuals, such that e ' = Y ' - YJ ' e ( = Y ( - YJ ( e 2 = Y 2 - YJ 2 4 P age

5 Hence, we can write a column list of these residuals, as our residual vector: e ' e =. e ( e 2 Therefore, e = y - y8. On the other hand, we have y8 = X β?* = X (X : X) C' (X : y). Hence, the orthogonal projection operator X (X : X) C' X : is called the HAT MATRIX and denoted by H, since it puts a hat on the response vector. Thus, y8 = Hy. This implies that the residual vector is generated by the data set values, e = y Hy = (I-H)y. We can measure the variations in the data set for the response values and construct performance measures for our regression equation, to answer two questions: Question One: How close our predictions are to reality? Question Two: How reliable our predictions are? Now, we will answer these two questions. Measure of variations on the data for response values: (TOTAL) SST = 2 4W' (Y 4 YY ) ( and df : = n-1. Where YY = \ ' ] 2 Y 2 4W' 4. (Regression) SSR = 2 4W' _YJ 4 YY ) ( df` = k. (Error) SSE = 2 4W' (Y 4 YJ 4 ) ( df c = n-k-1. Where, SST = SSR + SSE and df : = df`+df c. 5 P age

6 Variance Estimators for the response: MSR = ff` g hi, MSE = ffc g hj (where MSE = σk ( ). MSR estimates the variance of the response, generated by the regression equation, and MSE estimates the variance of response, generated by randomness. Performance Measures for our Regression Equation: (A) Standard Error of Prediction S m op o q o s = MSE, (σk, estimated standard deviation) This is the estimated standard deviation for the response and it indicates the average distance between the actual response and the estimated response by our regression equation. Standard error answers the first question. (B) Coefficient of Determination a.k.a. R-SQUARE. r ( = ff`. This ratio indicates what proportion of variations in the response behavior can be ff: explained by the regression equation. This r ( is the squared linear correlation coefficient between the response and the set of all possible causes, such that r = cos θ. In this context, Actual actual response ; Projection projected response k + 1 dimensional linear, or affine space 6 P age

7 Where the angle θ is the angle between the unknown y vector, response vector, and the tangent plane of the space X, generated by the possible causes information. The coefficient of determination answers the second question. If we wish to compare many different linear regression models, then we use the adjusted form of R-Square: r }g" ( = 1 ~(1 r ( ) \ 2C' 2C)C' ]. r }g" ( is a function of both the number of predictors (possible causes) used in a model and a function of the number of observations in the data set. We wish to use our regression equation, if we have evidence of the agreement between our model and our data set. That is, we investigate our data set for the evidence of our model assumptions such as linear relationship between predictors and the response, normal behavior for the response, constant variance (homoscedasticity) of the response, and independence of the observations. Therefore, we investigate the data set for sufficient evidence of the following assumptions of the model: 1. Normality 2. Homoscedasticity 3. Linearity Also, we investigate the data set for the evidence of the independence of observation. 1. Normality assumption indicates that the data set recorded response values are subject to a normal probability law. Our investigative method is to employ the normal probability plot of the data set recorded response values. If this plot results a line, or a graph that is not significantly different than a line, we take this as a sufficient evidence for satisfying the normality assumption. Also, a goodness-of-fit-test, such as the chi-square-test may be used to inspect the normality of the data set recorded response values. 2. Homoscedasticity, or the constant variance, assumption for the response may be inspected by using the residual plot of the residuals versus data set recorded response values. If we see the same spread of residuals when the data set recorded response values and changing from a small level to a high level, then we take it as a sufficient evidence of homoscedasticity. 3. Linearity assumption is investigated for the evidence of a liner relationship between the data set recorded response values and the data set recorded collection of all possible cause values. This investigation is known as the F-test. 7 P age

8 We prepare a summary information of the data set, called as ANOVA TABLE: SOURCE df SS MS Regression Error k n-k-1 SSR SSE MSR MSE TOTAL n-1 SST Where SST = SSR + SSE and df : = df` + df c. Also, MSR = ff` g i = ff` ). And MSE = ffc g j = ffc 2C)C'. Thus, we construct a hypothesis test for linear relationship between the response and the set of all possible causes (called the F-test): H H : β ' = β ( = = β ) = 0. (Null hypothesis declares a belief that there is no linear relation between the response and the cause factor, and the response is generated by pure randomness.) H ' : At least one β " is significantly different than zero. (i.e., not all coefficients of the possible causes equal to zero. Alternative hypothesis declares a belief that there is a linear relationship between the response and the set of all possible causes.) Level of significance: Test statistic: F f: : = f` fc ~ F g i, df c. F ` : ˆ = F ; df`, df c = F ; k, (n-k-1). p-value = P (F>F f: : ) by using F ), (n-k-1) probability distribution. Decision Rule: If F f: : > F ` : ˆ, then reject H H Or, if p-value <, then reject H H. Decision: Case-A We cannot reject H H at level of significance. Thus, we don t have sufficient evidence of linear relationship between the response and the set of all possible causes. Decision: Case-B We reject H H at level of significance. We are (1- )% confident that there is sufficient evidence of a linear relationship between the response event and the set of all possible causes. 8 P age

9 If we decide that there is sufficient evidence of a linear relationship between the response and the set of all possible causes at a desired level of significance,, then we investigate each and every possible cause for evidence of a linear relationship with the response event, at the same level of significance used in the F-test. This individual investigation of each possible cause will identify which possible cause information is necessary (to explain the response behavior) and which possible cause information is not needed for the regression equation. We can employ one of the three alternative investigative tools for their individual inspection of X ", for j = 1,2,3,, k: I) t TEST for β " : H H : β " = 0. (This statement assumes that there is no linear relationship between the response event and the possible cause number j.) H ' : β " 0. Level of significance: (same as was used on the F-TEST) Test Statistic: t f: : = C f ~ t 2C)C'. Where S " = f. špšq š s ffo, and S m. op o q o s = MSE ; SSX " = 2 4W' _X 4" XY " ) (, XY " = ( ' ) 2 X 2 4W' 4". t ` : ˆ = t ; n-k-1. q p-value = P (T > t f: : ), by using the t 2C)C' probability distribution. Decision Rule: If t f: : > t ` : ˆ, then reject H H. Or, if p-value < (, then reject H H. Decision: Case A We cannot reject H H, at level of significance. Hence, we have no evidence for the necessity of the knowledge of the possible cause number j, in order to estimate the response behavior. Thus we can eliminate the employment of X ". 9 P age

10 Decision: Case-B We reject H H at level of significance. Thus, we are (1- )% confident that there is a linear relationship between the possible cause number j and the response event. Thus, we need to know X ". II) (1- )% confidence interval estimator of β " ; j=1,2,, k. [b " (t ` : ˆ) (S " ), b " + (t ` : ˆ) (S " )] Where t ` : ˆ= t, (n-k-1), S " = f.šp š s, by using the same used in the F- q ffo TEST before. If this interval includes zero in it, then there is no sufficient evidence of β " is significantly different than zero. Therefore, we do not need to know X ". Otherwise, if this interval does not include zero in it, then there is strong evidence that the β " is significantly different than zero, with (1- ) probability. Hence, we need to know X ". III) Partial F-Test for X " ; j=1,2,3,, k. H H : The knowledge of X " does not contribute significantly to explain the response event. H ' : The knowledge of X " does significantly contribute to explain the response event. Level of significance: (The same used in the F-TEST.) Test Statistic: F f: : = [ff` ( ˆˆ o )Cff` _ ˆˆ o co cž: o Ÿ] fc F f: : ~ F ', (n-k-1). Where SSR (ALL Xs) is the SSR for a regression equation that is utilizing all the predictors knowledge, {X ', X (,, X ) }, and also SSR (ALL Xs EXCEPT X " ) is the SSR for a regression equation that is using all the predictors, except the knowledge of X " ; { X ', X (,, X "C', X "F', X ) }. We measure the degree of contribution, made by the knowledge of X ", to the prediction of the response event as the difference between the full model generated information versus the reduced model generated information, such as: SSR (X " ALL Xs EXCEPT X " ) = SSR (ALL Xs) SSR (ALL Xs EXCEPT X " ). Also, we have F ` : ˆ= F ; 1, n-k-1. p-value = P(F> F f: : ), by using the F ', n-k-1 probability distribution. 10 P age

11 Decision Rule: If F f: : > F ` : ˆ, then reject H H. Or, if p-value <, then reject H H. Decision: Case-A We cannot reject H H at level of significance. Thus, the knowledge of X " is not needed. Decision: Case-B We reject H H at level of significance. Hence, we are (1- )% confident that the knowledge of X " is necessary to predict the response event. Here, we generate another critical information about the degree of contribution generated by the knowledge of X " ; measured by the COEFFICIENT OF PARTIAL DETERMINATION, denoted by: r ( m. {ALL Xs EXCEPT X " }. (. {ALL Xs EXCEPT X " } = r m ff` _o ˆˆ o co cž: o ) ff:cff` ( ˆˆ o )Fff` _o ˆˆ o co cž: o ) This information is necessary to associate a monetary value to the knowledge of X ". IV) Independence assumption for the data set observations can be satisfied by a random selection of observations, collected in the same time period. Otherwise, if the data set is a Time-Series a.k.a. Historical Data, then we investigate for sufficient evidence of positive autocorrelation between data set observations, by utilizing a Durbin-Watson test. Interaction Effect: We may introduce the interaction effect between predictors as a multiplicative effect and generate a new predictor to represent a multiplication of individual predictors for all possible combinations of the original predictors. This new predictor is treated as any other individual ones, but represents one interaction effect between the assumed combination of original predictors, constituting the new predictor term. A linear regression model accommodates quantitative variables and qualitative variables without any difficulty. Categorical Variables: Represented by an indicator function, also known as characteristic function, used for identifying membership to a set. 11 P age

12 Definition: Let A be a set of elements. If an element denoted by R, belongs to the set A, then the characteristic function of the set A takes the value of one, for this element. Otherwise if R does not belong to the set A, the characteristic function of the set A will take the value of zero, for this element, such that 1, if R A. χ (R) = 0, if R A. The characteristic function of the set A, denoted by χ, is also called the Indicator function, and denoted by Ι. In the context of Bernoulli trial, we use the term dummy variable, to identify an outcome, such that: 1, if an observation is a SUCCESS. X = 0, if an obersevation is a FAILURE. Here, success and failure are generic terms that indicate a dichotomy, or, two distinct events that are mutually exclusive and collectively exhaustive. Now, we may introduce an application of the linear regression model to a very important decision making activity, How to estimate the probability of success? (generated by many possible causes), for binomial experiments. 12 P age

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1) Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent.

An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent. Statistical Methods in Business Lecture 6. Binomial Logistic Regression An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

SIMPLE REGRESSION ANALYSIS. Business Statistics

SIMPLE REGRESSION ANALYSIS. Business Statistics SIMPLE REGRESSION ANALYSIS Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

There are statistical tests that compare prediction of a model with reality and measures how significant the difference.

There are statistical tests that compare prediction of a model with reality and measures how significant the difference. Statistical Methods in Business Lecture 11. Chi Square, χ 2, Goodness-of-Fit Test There are statistical tests that compare prediction of a model with reality and measures how significant the difference.

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is. Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Chapter 14 Simple Linear Regression (A)

Chapter 14 Simple Linear Regression (A) Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

The Multiple Regression Model

The Multiple Regression Model Multiple Regression The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & or more independent variables (X i ) Multiple Regression Model with k Independent Variables:

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

F-tests and Nested Models

F-tests and Nested Models F-tests and Nested Models Nested Models: A core concept in statistics is comparing nested s. Consider the Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ. (1) The following reduced s are special cases (nested within)

More information

If we have many sets of populations, we may compare the means of populations in each set with one experiment.

If we have many sets of populations, we may compare the means of populations in each set with one experiment. Statistical Methods in Business Lecture 3. Factorial Design: If we have many sets of populations we may compare the means of populations in each set with one experiment. Assume we have two factors with

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Week 12 Hypothesis Testing, Part II Comparing Two Populations

Week 12 Hypothesis Testing, Part II Comparing Two Populations Week 12 Hypothesis Testing, Part II Week 12 Hypothesis Testing, Part II Week 12 Objectives 1 The principle of Analysis of Variance is introduced and used to derive the F-test for testing the model utility

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing

More information

Lecture 5: Linear Regression

Lecture 5: Linear Regression EAS31136/B9036: Statistics in Earth & Atmospheric Sciences Lecture 5: Linear Regression Instructor: Prof. Johnny Luo www.sci.ccny.cuny.edu/~luo Dates Topic Reading (Based on the 2 nd Edition of Wilks book)

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Concordia University (5+5)Q 1.

Concordia University (5+5)Q 1. (5+5)Q 1. Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Mid Term Test May 26, 2004 Two Hours 3 Instructor Course Examiner

More information

What is a Hypothesis?

What is a Hypothesis? What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population mean Example: The mean monthly cell phone bill in this city is μ = $42 population proportion Example:

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511 STAT 511 Lecture : Simple linear regression Devore: Section 12.1-12.4 Prof. Michael Levine December 3, 2018 A simple linear regression investigates the relationship between the two variables that is not

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities

Bayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities Valua%on and pricing (November 5, 2013) LEARNING OBJECTIVES Lecture 7 Decision making (part 3) Regression theory Olivier J. de Jong, LL.M., MM., MBA, CFD, CFFA, AA www.olivierdejong.com 1. List the steps

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6 STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

BNAD 276 Lecture 10 Simple Linear Regression Model

BNAD 276 Lecture 10 Simple Linear Regression Model 1 / 27 BNAD 276 Lecture 10 Simple Linear Regression Model Phuong Ho May 30, 2017 2 / 27 Outline 1 Introduction 2 3 / 27 Outline 1 Introduction 2 4 / 27 Simple Linear Regression Model Managerial decisions

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Chapter 7 Student Lecture Notes 7-1

Chapter 7 Student Lecture Notes 7-1 Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

The simple linear regression model discussed in Chapter 13 was written as

The simple linear regression model discussed in Chapter 13 was written as 1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

Lecture 9 SLR in Matrix Form

Lecture 9 SLR in Matrix Form Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.

More information

NATCOR Regression Modelling for Time Series

NATCOR Regression Modelling for Time Series Universität Hamburg Institut für Wirtschaftsinformatik Prof. Dr. D.B. Preßmar Professor Robert Fildes NATCOR Regression Modelling for Time Series The material presented has been developed with the substantial

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Model Building Chap 5 p251

Model Building Chap 5 p251 Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Unbalanced Data in Factorials Types I, II, III SS Part 1

Unbalanced Data in Factorials Types I, II, III SS Part 1 Unbalanced Data in Factorials Types I, II, III SS Part 1 Chapter 10 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 14 When we perform an ANOVA, we try to quantify the amount of variability in the data accounted

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Unit 27 One-Way Analysis of Variance

Unit 27 One-Way Analysis of Variance Unit 27 One-Way Analysis of Variance Objectives: To perform the hypothesis test in a one-way analysis of variance for comparing more than two population means Recall that a two sample t test is applied

More information

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal yuppal@ysu.edu Sampling Distribution of b 1 Expected value of b 1 : Variance of b 1 : E(b 1 ) = 1 Var(b 1 ) = σ 2 /SS x Estimate of

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

More information

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

STAT 212 Business Statistics II 1

STAT 212 Business Statistics II 1 STAT 1 Business Statistics II 1 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 1: BUSINESS STATISTICS II Semester 091 Final Exam Thursday Feb

More information