We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.
|
|
- Clyde Neal
- 5 years ago
- Views:
Transcription
1 Statistical Methods in Business Lecture 5. Linear Regression We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model. Let there be k number of possible causes, such that the value of a possible cause is denoted by X ", the index j = 1,2,3,, k points at a particular possible cause. Linear Regression Model: Assume Y = β 0 + β 1 X ' + β 2 X ( + + β k X ) + Ɛ. Ɛ ~ Ɲ (0, σ ( ). Here, Y denotes the value of the response which is generated by a linear combination of all possible causes.x " /, and other factors that are contributing to the response behavior, assumed to be independent of the cause factor, denoted by the independent random variable Ɛ. Also, we assumed that this random factor behaves according to a normal (a.k.a. Gaussian) probability law, it does not contribute anything to the average response behavior in the long run, yet it generates a constant variance for the response for all possible combinations of the cause values. We wish to collect independent observations and construct a regression equation for the collected data set. Assume that we have finitely many independent observations, say n number of observations, in our data set. Each observation informs us about the values of each possible cause and their response: Observation (X ' X ( X ) Y ) (values) 1 st (X '' X '( X ') Y ' ) 2 nd (X (' X (( X () Y ( ) n th (X 2' X 2( X 2) Y 2 ) DATA SET 1 P age
2 Now, we can fit the model with our data set and produce n number of equations: Y ' = β 0 + β 1 X '' + β 2 X '( + + β k X ') + Ɛ ' Y ( = β 0 + β 1 X (' + β 2 X (( + + β k X () + Ɛ ( Y 3 = β 0 + β 1 X 3' + β 2 X 3( + + β k X 3) + Ɛ 3 Y 2 = β 0 + β 1 X 2' + β 2 X 2( + + β k X 2) + Ɛ 2 Here, we know all the Y 4 and X 4" values, for i = 1,2,, n and j = 1,2,, k, from our data set recorded values. But, β j values for j = 0,1,2,, k are (assumed) unknown constant numbers. Therefore, we will search for the best possible numerical values to estimate the unknown β j values, by using our data set information. Now, we can represent the MODEL AND DATA SET INFORMATION FITTED TOGETHER with a simple algebraic notation. Let y = Y ' Y ( Y 2 be the column listing of all the recorded values of the response on the data set. This is a column vector of the recorded responses for each observation and each coordinate gives this information. We call y as the response vector. Similarly, let X = 1 X '' X '( X ') 1 X (' X (( X () 1 X 3' X 3( X 3) 1 X 2' X 2( X 2) be the writing of all observed values of the possible causes in the data set, such that each row (line) starts with a constant one and followed by the ordered list of numerical values of possible causes in each observation. This matrix of possible cause values, X, is known as the design matrix. 2 P age
3 Also, let β = β H β ' be the column listing of all unknown parameters, known as the parameter vector. β ( β ) Also, let Ɛ = Ɛ ' Ɛ ( Ɛ 2 be the column listing of all error terms, known as the error vector Thus, we can represent these MODEL AND DATA FITTED TOGETHER, n equations as y = X β + Ɛ. This linear algebraic equation can be easily utilized in order to search and find a solution for the unknown β. Our objective is to find the best possible numerical values for β such that the estimated β will minimize the prediction error. The conditions for minimizing the prediction error is the predicted response vector, y8, is an orthogonal projection of the unknown response vector, y, on to the tangent plane of the space generated by the possible causes information. Thus the magnitude of the prediction error between the actual response vector and the predicted response vector is minimized, i.e. X Ɛ or, X : Ɛ = 0. Since y = X β +Ɛ, we have Ɛ = y - X β. Thus, the minimum prediction error condition yields the best possible numerical estimated values for β, such that X : (y X β?) = 0. X : y X : X β? = 0. X : y = X : X β?. (X : X) is a square, positive definite, (k+1) x (k+1) matrix, therefore it has an inverse, (X : X) C'. Consequently, (X : X) C' (X : y) = (X : X) C' (X : X) β?. Since (X : X) C' (X : X)= I ()F')G ()F'), identity matrix (X : X) C' (X : y) = β?. Therefore, the best estimated parameter vector that minimizes the magnitude of the prediction error is β?* = (X : X) C' X : y. 3 P age
4 Let β?* = b H b ' b ( b ) Now, we can construct the regression equation for the data set, by using these estimated parameter values; YJ = b K + b ' X ' + b ( X ( + + b ) X ). Thus, we can estimate the response values by using the data set recorded possible cause values, such as: Y ' L = b K + b ' X '' + b ( X '( + + b ) X '). Y L ( = b K + b ' X (' + b ( X (( + + b ) X (). Y 2 M = b K + b ' X 2' + b ( X 2( + + b ) X 2). Or, better yet, we can write a column list of these estimated response values by the data set recorded possible cause values; y8 = and thus we have y8 = X β?* Y L. ' Y L ( Y 2 M If we look at the differences between what is observed and what is estimated for the response, we generate residuals, such that e ' = Y ' - YJ ' e ( = Y ( - YJ ( e 2 = Y 2 - YJ 2 4 P age
5 Hence, we can write a column list of these residuals, as our residual vector: e ' e =. e ( e 2 Therefore, e = y - y8. On the other hand, we have y8 = X β?* = X (X : X) C' (X : y). Hence, the orthogonal projection operator X (X : X) C' X : is called the HAT MATRIX and denoted by H, since it puts a hat on the response vector. Thus, y8 = Hy. This implies that the residual vector is generated by the data set values, e = y Hy = (I-H)y. We can measure the variations in the data set for the response values and construct performance measures for our regression equation, to answer two questions: Question One: How close our predictions are to reality? Question Two: How reliable our predictions are? Now, we will answer these two questions. Measure of variations on the data for response values: (TOTAL) SST = 2 4W' (Y 4 YY ) ( and df : = n-1. Where YY = \ ' ] 2 Y 2 4W' 4. (Regression) SSR = 2 4W' _YJ 4 YY ) ( df` = k. (Error) SSE = 2 4W' (Y 4 YJ 4 ) ( df c = n-k-1. Where, SST = SSR + SSE and df : = df`+df c. 5 P age
6 Variance Estimators for the response: MSR = ff` g hi, MSE = ffc g hj (where MSE = σk ( ). MSR estimates the variance of the response, generated by the regression equation, and MSE estimates the variance of response, generated by randomness. Performance Measures for our Regression Equation: (A) Standard Error of Prediction S m op o q o s = MSE, (σk, estimated standard deviation) This is the estimated standard deviation for the response and it indicates the average distance between the actual response and the estimated response by our regression equation. Standard error answers the first question. (B) Coefficient of Determination a.k.a. R-SQUARE. r ( = ff`. This ratio indicates what proportion of variations in the response behavior can be ff: explained by the regression equation. This r ( is the squared linear correlation coefficient between the response and the set of all possible causes, such that r = cos θ. In this context, Actual actual response ; Projection projected response k + 1 dimensional linear, or affine space 6 P age
7 Where the angle θ is the angle between the unknown y vector, response vector, and the tangent plane of the space X, generated by the possible causes information. The coefficient of determination answers the second question. If we wish to compare many different linear regression models, then we use the adjusted form of R-Square: r }g" ( = 1 ~(1 r ( ) \ 2C' 2C)C' ]. r }g" ( is a function of both the number of predictors (possible causes) used in a model and a function of the number of observations in the data set. We wish to use our regression equation, if we have evidence of the agreement between our model and our data set. That is, we investigate our data set for the evidence of our model assumptions such as linear relationship between predictors and the response, normal behavior for the response, constant variance (homoscedasticity) of the response, and independence of the observations. Therefore, we investigate the data set for sufficient evidence of the following assumptions of the model: 1. Normality 2. Homoscedasticity 3. Linearity Also, we investigate the data set for the evidence of the independence of observation. 1. Normality assumption indicates that the data set recorded response values are subject to a normal probability law. Our investigative method is to employ the normal probability plot of the data set recorded response values. If this plot results a line, or a graph that is not significantly different than a line, we take this as a sufficient evidence for satisfying the normality assumption. Also, a goodness-of-fit-test, such as the chi-square-test may be used to inspect the normality of the data set recorded response values. 2. Homoscedasticity, or the constant variance, assumption for the response may be inspected by using the residual plot of the residuals versus data set recorded response values. If we see the same spread of residuals when the data set recorded response values and changing from a small level to a high level, then we take it as a sufficient evidence of homoscedasticity. 3. Linearity assumption is investigated for the evidence of a liner relationship between the data set recorded response values and the data set recorded collection of all possible cause values. This investigation is known as the F-test. 7 P age
8 We prepare a summary information of the data set, called as ANOVA TABLE: SOURCE df SS MS Regression Error k n-k-1 SSR SSE MSR MSE TOTAL n-1 SST Where SST = SSR + SSE and df : = df` + df c. Also, MSR = ff` g i = ff` ). And MSE = ffc g j = ffc 2C)C'. Thus, we construct a hypothesis test for linear relationship between the response and the set of all possible causes (called the F-test): H H : β ' = β ( = = β ) = 0. (Null hypothesis declares a belief that there is no linear relation between the response and the cause factor, and the response is generated by pure randomness.) H ' : At least one β " is significantly different than zero. (i.e., not all coefficients of the possible causes equal to zero. Alternative hypothesis declares a belief that there is a linear relationship between the response and the set of all possible causes.) Level of significance: Test statistic: F f: : = f` fc ~ F g i, df c. F ` : ˆ = F ; df`, df c = F ; k, (n-k-1). p-value = P (F>F f: : ) by using F ), (n-k-1) probability distribution. Decision Rule: If F f: : > F ` : ˆ, then reject H H Or, if p-value <, then reject H H. Decision: Case-A We cannot reject H H at level of significance. Thus, we don t have sufficient evidence of linear relationship between the response and the set of all possible causes. Decision: Case-B We reject H H at level of significance. We are (1- )% confident that there is sufficient evidence of a linear relationship between the response event and the set of all possible causes. 8 P age
9 If we decide that there is sufficient evidence of a linear relationship between the response and the set of all possible causes at a desired level of significance,, then we investigate each and every possible cause for evidence of a linear relationship with the response event, at the same level of significance used in the F-test. This individual investigation of each possible cause will identify which possible cause information is necessary (to explain the response behavior) and which possible cause information is not needed for the regression equation. We can employ one of the three alternative investigative tools for their individual inspection of X ", for j = 1,2,3,, k: I) t TEST for β " : H H : β " = 0. (This statement assumes that there is no linear relationship between the response event and the possible cause number j.) H ' : β " 0. Level of significance: (same as was used on the F-TEST) Test Statistic: t f: : = C f ~ t 2C)C'. Where S " = f. špšq š s ffo, and S m. op o q o s = MSE ; SSX " = 2 4W' _X 4" XY " ) (, XY " = ( ' ) 2 X 2 4W' 4". t ` : ˆ = t ; n-k-1. q p-value = P (T > t f: : ), by using the t 2C)C' probability distribution. Decision Rule: If t f: : > t ` : ˆ, then reject H H. Or, if p-value < (, then reject H H. Decision: Case A We cannot reject H H, at level of significance. Hence, we have no evidence for the necessity of the knowledge of the possible cause number j, in order to estimate the response behavior. Thus we can eliminate the employment of X ". 9 P age
10 Decision: Case-B We reject H H at level of significance. Thus, we are (1- )% confident that there is a linear relationship between the possible cause number j and the response event. Thus, we need to know X ". II) (1- )% confidence interval estimator of β " ; j=1,2,, k. [b " (t ` : ˆ) (S " ), b " + (t ` : ˆ) (S " )] Where t ` : ˆ= t, (n-k-1), S " = f.šp š s, by using the same used in the F- q ffo TEST before. If this interval includes zero in it, then there is no sufficient evidence of β " is significantly different than zero. Therefore, we do not need to know X ". Otherwise, if this interval does not include zero in it, then there is strong evidence that the β " is significantly different than zero, with (1- ) probability. Hence, we need to know X ". III) Partial F-Test for X " ; j=1,2,3,, k. H H : The knowledge of X " does not contribute significantly to explain the response event. H ' : The knowledge of X " does significantly contribute to explain the response event. Level of significance: (The same used in the F-TEST.) Test Statistic: F f: : = [ff` ( ˆˆ o )Cff` _ ˆˆ o co cž: o Ÿ] fc F f: : ~ F ', (n-k-1). Where SSR (ALL Xs) is the SSR for a regression equation that is utilizing all the predictors knowledge, {X ', X (,, X ) }, and also SSR (ALL Xs EXCEPT X " ) is the SSR for a regression equation that is using all the predictors, except the knowledge of X " ; { X ', X (,, X "C', X "F', X ) }. We measure the degree of contribution, made by the knowledge of X ", to the prediction of the response event as the difference between the full model generated information versus the reduced model generated information, such as: SSR (X " ALL Xs EXCEPT X " ) = SSR (ALL Xs) SSR (ALL Xs EXCEPT X " ). Also, we have F ` : ˆ= F ; 1, n-k-1. p-value = P(F> F f: : ), by using the F ', n-k-1 probability distribution. 10 P age
11 Decision Rule: If F f: : > F ` : ˆ, then reject H H. Or, if p-value <, then reject H H. Decision: Case-A We cannot reject H H at level of significance. Thus, the knowledge of X " is not needed. Decision: Case-B We reject H H at level of significance. Hence, we are (1- )% confident that the knowledge of X " is necessary to predict the response event. Here, we generate another critical information about the degree of contribution generated by the knowledge of X " ; measured by the COEFFICIENT OF PARTIAL DETERMINATION, denoted by: r ( m. {ALL Xs EXCEPT X " }. (. {ALL Xs EXCEPT X " } = r m ff` _o ˆˆ o co cž: o ) ff:cff` ( ˆˆ o )Fff` _o ˆˆ o co cž: o ) This information is necessary to associate a monetary value to the knowledge of X ". IV) Independence assumption for the data set observations can be satisfied by a random selection of observations, collected in the same time period. Otherwise, if the data set is a Time-Series a.k.a. Historical Data, then we investigate for sufficient evidence of positive autocorrelation between data set observations, by utilizing a Durbin-Watson test. Interaction Effect: We may introduce the interaction effect between predictors as a multiplicative effect and generate a new predictor to represent a multiplication of individual predictors for all possible combinations of the original predictors. This new predictor is treated as any other individual ones, but represents one interaction effect between the assumed combination of original predictors, constituting the new predictor term. A linear regression model accommodates quantitative variables and qualitative variables without any difficulty. Categorical Variables: Represented by an indicator function, also known as characteristic function, used for identifying membership to a set. 11 P age
12 Definition: Let A be a set of elements. If an element denoted by R, belongs to the set A, then the characteristic function of the set A takes the value of one, for this element. Otherwise if R does not belong to the set A, the characteristic function of the set A will take the value of zero, for this element, such that 1, if R A. χ (R) = 0, if R A. The characteristic function of the set A, denoted by χ, is also called the Indicator function, and denoted by Ι. In the context of Bernoulli trial, we use the term dummy variable, to identify an outcome, such that: 1, if an observation is a SUCCESS. X = 0, if an obersevation is a FAILURE. Here, success and failure are generic terms that indicate a dichotomy, or, two distinct events that are mutually exclusive and collectively exhaustive. Now, we may introduce an application of the linear regression model to a very important decision making activity, How to estimate the probability of success? (generated by many possible causes), for binomial experiments. 12 P age
Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)
Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationLecture 9: Linear Regression
Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression
More informationAn ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent.
Statistical Methods in Business Lecture 6. Binomial Logistic Regression An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationMathematics for Economics MA course
Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationLecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is
Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y
More informationSIMPLE REGRESSION ANALYSIS. Business Statistics
SIMPLE REGRESSION ANALYSIS Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationLecture 7: Hypothesis Testing and ANOVA
Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationAMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression
AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number
More informationChapter 4. Regression Models. Learning Objectives
Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing
More informationThere are statistical tests that compare prediction of a model with reality and measures how significant the difference.
Statistical Methods in Business Lecture 11. Chi Square, χ 2, Goodness-of-Fit Test There are statistical tests that compare prediction of a model with reality and measures how significant the difference.
More informationSTAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing
STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationSTAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)
STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points
More informationChapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression
Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationLinear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.
Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The
More informationCS 5014: Research Methods in Computer Science
Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationChapter 14 Simple Linear Regression (A)
Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationThe Multiple Regression Model
Multiple Regression The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & or more independent variables (X i ) Multiple Regression Model with k Independent Variables:
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationRegression Models. Chapter 4. Introduction. Introduction. Introduction
Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager
More informationRegression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.
Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationFinding Relationships Among Variables
Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationF-tests and Nested Models
F-tests and Nested Models Nested Models: A core concept in statistics is comparing nested s. Consider the Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ. (1) The following reduced s are special cases (nested within)
More informationIf we have many sets of populations, we may compare the means of populations in each set with one experiment.
Statistical Methods in Business Lecture 3. Factorial Design: If we have many sets of populations we may compare the means of populations in each set with one experiment. Assume we have two factors with
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationHypothesis Testing hypothesis testing approach
Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we
More informationWeek 12 Hypothesis Testing, Part II Comparing Two Populations
Week 12 Hypothesis Testing, Part II Week 12 Hypothesis Testing, Part II Week 12 Objectives 1 The principle of Analysis of Variance is introduced and used to derive the F-test for testing the model utility
More informationBasic Business Statistics, 10/e
Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:
More informationBusiness Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal
Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing
More informationLecture 5: Linear Regression
EAS31136/B9036: Statistics in Earth & Atmospheric Sciences Lecture 5: Linear Regression Instructor: Prof. Johnny Luo www.sci.ccny.cuny.edu/~luo Dates Topic Reading (Based on the 2 nd Edition of Wilks book)
More informationRegression Analysis. Regression: Methodology for studying the relationship among two or more variables
Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the
More informationSTAT Chapter 11: Regression
STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationConcordia University (5+5)Q 1.
(5+5)Q 1. Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Mid Term Test May 26, 2004 Two Hours 3 Instructor Course Examiner
More informationWhat is a Hypothesis?
What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population mean Example: The mean monthly cell phone bill in this city is μ = $42 population proportion Example:
More informationChapter 3 Multiple Regression Complete Example
Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationCh. 1: Data and Distributions
Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and
More informationSTAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511
STAT 511 Lecture : Simple linear regression Devore: Section 12.1-12.4 Prof. Michael Levine December 3, 2018 A simple linear regression investigates the relationship between the two variables that is not
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationBayesian Analysis LEARNING OBJECTIVES. Calculating Revised Probabilities. Calculating Revised Probabilities. Calculating Revised Probabilities
Valua%on and pricing (November 5, 2013) LEARNING OBJECTIVES Lecture 7 Decision making (part 3) Regression theory Olivier J. de Jong, LL.M., MM., MBA, CFD, CFFA, AA www.olivierdejong.com 1. List the steps
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationSTA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6
STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf
More informationChapter 14 Student Lecture Notes 14-1
Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationInference for Regression Inference about the Regression Model and Using the Regression Line
Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about
More informationLINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises
LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on
More informationBusiness Statistics. Lecture 10: Correlation and Linear Regression
Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form
More informationBNAD 276 Lecture 10 Simple Linear Regression Model
1 / 27 BNAD 276 Lecture 10 Simple Linear Regression Model Phuong Ho May 30, 2017 2 / 27 Outline 1 Introduction 2 3 / 27 Outline 1 Introduction 2 4 / 27 Simple Linear Regression Model Managerial decisions
More informationSummary of Chapters 7-9
Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More informationChapter 7 Student Lecture Notes 7-1
Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,
More informationThe simple linear regression model discussed in Chapter 13 was written as
1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple
More informationConfidence Interval for the mean response
Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.
More informationLecture 9 SLR in Matrix Form
Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.
More informationNATCOR Regression Modelling for Time Series
Universität Hamburg Institut für Wirtschaftsinformatik Prof. Dr. D.B. Preßmar Professor Robert Fildes NATCOR Regression Modelling for Time Series The material presented has been developed with the substantial
More informationSTA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007
STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.
More informationModel Building Chap 5 p251
Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationNotes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1
Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationUnbalanced Data in Factorials Types I, II, III SS Part 1
Unbalanced Data in Factorials Types I, II, III SS Part 1 Chapter 10 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 14 When we perform an ANOVA, we try to quantify the amount of variability in the data accounted
More informationSimple linear regression
Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationSTAT 350 Final (new Material) Review Problems Key Spring 2016
1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationInference for Regression Simple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating
More informationUnit 27 One-Way Analysis of Variance
Unit 27 One-Way Analysis of Variance Objectives: To perform the hypothesis test in a one-way analysis of variance for comparing more than two population means Recall that a two sample t test is applied
More informationEcon 3790: Business and Economics Statistics. Instructor: Yogesh Uppal
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal yuppal@ysu.edu Sampling Distribution of b 1 Expected value of b 1 : Variance of b 1 : E(b 1 ) = 1 Var(b 1 ) = σ 2 /SS x Estimate of
More informationVariance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf
More informationSTA121: Applied Regression Analysis
STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using
More informationChapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression
Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationSTAT 212 Business Statistics II 1
STAT 1 Business Statistics II 1 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 1: BUSINESS STATISTICS II Semester 091 Final Exam Thursday Feb
More information