Dummy Variable Multiple Regression Forecasting Model
|
|
- Osborn Todd
- 5 years ago
- Views:
Transcription
1 International Journal of Engineering Science Invention ISSN (Online): , ISSN (Print): wwwijesiorg Volume 2 Issue 3 ǁ March 23 ǁ PP2-5 Dummy Variable Multiple Regression Forecasting Model Nwankwo, H, Oyeka, I 2 Department of Statistics, Nnamdi zikiwe University, wka Nigeria 2 Department of Statistics, Nnamdi zikiwe University, wka Nigeria BSTRT: method of using multiple regression in making forecasts for data which are arranged in a sequence, including time-order, is presented here This method uses dummy variables, which makes it robust The design matrix is obtained by a cumulative coding procedure which enables it overcome the setback of equal spacing associated with dummy variable regression methods The use of direct effects obtained by the method of path analysis makes the whole procedure unique Keywords: umulative oding, Dummy, Path nalysis, Design Matrix, Robust, Direct Effect I INTRODUTION Usually, multiple regression models are used for estimating the contributions or effects of independent variables on a dependent variable first order multiple regression model with two independent variables and 2 is in the form [] Y i =β +β i +β 2 i2 +ε i () This model is linear in the parameters and linear in the independent variables Y i denotes the response in the ith trial and i and i2 are the values of the two independent variables in the ith trial The parameters of the model are β, β and β 2, and the error term is ε i ssuming that E(ε i ) =, the regression function for () is E(Y)=β +β +β 2 2 (2) Note that the regression function (2) is a plane and not a line The parameter β is the y intercept of the regression plane The parameter β indicates the change in the mean response per unit increase in when 2 is held constant Likewise, β 2 indicates the change in the mean response per unit increase in 2 when is held constant When there are more than two independent variables, say p- independent variables,, p-, the first order model is Y i =β +β i +β 2 i2 ++β p- i,p- +ε i (3) ssuming that E(ε i ) =, the response function for (3) is E(Y)=β +β +β β P- p- () The response function () is a hyper plane, which is a plane in more than two dimensions It is no longer possible to picture this response surface as we were able to do with (2), which is a plane in 3 dimensions Multiple regression is one of the most widely used of all statistical tools regression model for which the response surface is a plane can be used either in its own right when it is appropriate, or as an approximation to a more complex response surface Many complex response surfaces are often approximated well by a plane for limited ranges of the independent variables [] The use of multiple regression is largely limited to the problem of estimating the contributions, or estimation of the effects of the independent variables on the dependent variable Forecasting values of the dependent variable using multiple regression models is often of interest to researchers, though forecasting is not a common feature of existing regression methods The problem is that independent variables are not available for the period onto which forecasts are sought wwwijesiorg 2 P a g e
2 wwwijesiorg 3 P a g e method for carrying out forecasts of this sort is proposed here and will be applicable when data is in at least ordinal form II PROPOSED METHOD To achieve the objective of this paper, which is to develop a multiple regression forecasting model, we propose the use of dummy variable multiple regression modeling methods [2] Specifically a, dummy variable coding system would be used in such a way that each category or level of a parent independent variable in a regression model is represented by a pattern of s and s, forming a dummy variable set In order to avoid linear dependence among the dummy variables of a parent variable each parent variable is always represented by one dummy variable less than the number of its categories [3][] Thus if a given parent variable Z has z categories or levels, the corresponding design matrix will be represented ordinally by z- column vectors of ordinally coded dummy variables, x d, of s and s (for d=,2,,z-) The s and s in each x d are cumulative if the values of the level of the parent variable it represented are arranged together Specifically, the pth level (p =, 2, z) of the z levels of a parent variable Z will be represented by d ordinally coded column vectors of s and s for d =, 2, z- that is: id = (5) Then if, but without loss of generality, the observations in each level of Z are arranged all together, then the n x (z-) design matrix representing Z will consist of a set of z- cumulatively coded column vectors x d of s and s of the form Equation (6) is a prototype of ordinally coded design matrix with z- cumulatively coded column vectors x d of s and s representing the z levels of the parent variable Z Note that the first n elements of the first column x of representing the first level of Z are s while the remaining n-n are all s The firstn + n 2 elements of (6) z x x x x z Z of Level
3 x 2 are s, while the remaining n-(n + n 2 ) elements are all s and so on until finally all the elements of x z- are all s except the last n z elements which are all s Note that all the observations in the first level (level ) of Z are all coded in all the columns of the design matrix while observations in the last level (level z) of Z are all coded s in Note also that Z may be any set of parent independent variables such as, B, etc with levels a, b, c, etc respectively n ordinal dummy variable multiple regression model of y i on the x ij s may be expressed as y i ; i; 2; i2; a ; i, a; c ; i, c; e i (7) where j s are partial regression coefficients and e i are error terms uncorrelated with x ij s, with E(e i ) = ; has a levels, B has b levels has c levels, etc Note that the expected value of y i is E ( yi ) ; i; 2; i2; a; i, a; c; i, c; (8) Equation 7 may alternatively be expressed in its matrix form as y e (9) where y is an nx column vector of outcome values; is an nxr cumulatively coded design matrix of s and s; β is an rx column vector of regression coefficient and e is an nx column vector of error terms uncorrelated with with E(e)= where r is the rank of the design matrix Use of the method of least squares with either equation (7) or (9) yields an unbiased estimator of β as b ' ' () y where ' is the matrix inverse of ' yˆ b, the resulting predicted regression model is The following analysis of variance (NOV) table (Table), enables the testing of the adequacy of Equations (7) or (9) using the F test Table : nalysis of Variance (NOV) Table for Equation (9) Source of Sum of Squares (SS) Degrees of Mean Sum of F Ratio Variation Freedom (DF) Squares (MS) Regression 2 SSR b' x' y ny r SSR MSR r MSR F Error SSE y' y b' x' y n r SSE MSE MSE n r Total 2 SST y' y ny n () The null hypothesis to be tested for the adequacy of the regression model (Equation 7 or 9) is H : versus H : (2) H is tested using the test statistic F = MSR MSE This has an F distribution with r- and n-r degrees of freedom H is rejected at the level of significance if F F, ; r nr (3) Otherwise we do not reject Ho, where F (-, r-, n-r) is the critical value of the F distribution with r- and n-r degrees of freedom for a specified level wwwijesiorg P a g e
4 If H is rejected indicating that not all j s are equal to zero, then some other hypotheses concerning j s may be tested Note that k ; is interpreted in ordinal dummy variable regression model as the amount by which the dependent variable y on the average changes for every unit increase in x k compared with x k- or one unit decrease in x k relative to x k+ when all other independent variables in the model are held constant That is, k measures the amount by which on the average the dependent variable y increases or decreases for every unit change in x K compared with a corresponding unit change in either x k- or x k+ respectively when all other independent variables in the model are held constant Research interest may be in comparing the differential effects of any two ordinal dummy variables of a parent independent variable on the dependent variable For example one may be interested in testing the null hypothesis H : j; versus H : j; () Where the d s are estimated from Equation () as b d s for l =, 2 a-; j =, 2 a-; l j The null hypothesis of Equation () may be tested using the test statistic t se b b b bj; ' b b j; j; (5) MSE where is an r row vector of the form (,,,,, -,, ) Where and - correspond to the positions of β and β j; respectively in the rx column vector b and all other elements of are H is rejected at the level of significance if t t ; nr (6) Otherwise we do not reject H, where freedom for a specified level t ; nr is the critical value of the t distribution with n-r degrees of In general several other hypotheses may be tested For example one may be interested in comparing the effects of the ith level of factor, say and the jth level of factor say or of some combinations of some levels of several factors Thus interest may be in testing : versus H (7) H j; : j; Using the test statistic t se b b b bj; ' b j; (8) MSE where is a row vector as in Equation (5) except that and - now occurs at the positions corresponding to the ith level of factor and jth level of factor in b H is rejected as in Equation (6) Further interest may also be in estimating the total or overall effect of a given parent independent variable through the effects of its representative ordinal dummy variables on the dependent variable To do this it should be noted that any parent variable is completely determined by its set of representative ordinal dummy variables III ESTIMTION OF EFFETS For the purposes of parameter estimation in dummy variable regression, the dummy variables of a parent independent variable are treated as intermediate (independent) variables between the parent variable and the dependent variable of interest [][5] Each dummy variable of the parent independent variable is then treated as a separate variable determined by its parent variable and determining the specified dependent variable Therefore in developing an expression for the regression effect of a parent independent variable on a specified wwwijesiorg 5 P a g e
5 dependent variable use is made of the simple effects of the set of dummy variables representing that parent independent variable on the dependent variable Now an equation expressing the determination of a dependent variable by the d-th dummy variable, x d (d=, 2 s-), of a parent independent variable V with s levels may be written in the form: y x x x e (9) d d In equation (9) y is an n-column vector representing the dependent variable, x d is an n-column vector denoting the d-th set of ordinal dummy variables representing the parent independent variable V, for d=,2,,s- x (x) is an n x (s-2) matrix of full column rank, s-2, representing all the other s-2 ordinal dummy variables for V d is the regression effects of x d on y, and + is an s-2 column vector of the regression effects of x (x) on y e is an n-column vector of uncorrelated error terms Note that equation (36) may be expressed more compactly in the form y e (2) Where = (x d, x (x) ) and ' ' y (2) Without loss of generality, take d as the first component of In ordinal dummy variable regression, d is a regression coefficient measuring the net effect of the d-th level of a parent independent variable, V, on a dependent variable, y, relative to the effect of the immediately succeeding level ((d-) level) of V, after adjustments have been made for the effects of other variables in the regression model Now to estimate the direct effect [6], 96[2], B v, of a given parent independent variable V on a dependent variable y, the dummies of the parent independent variable V are treated as intermediate variables between V and y Then, following the method of path analysis, [6][][2], B v is obtained as a weighted sum of d given as Bv s dd (22) d where the weights d is the simple regression coefficient of x d on V which is subject to the constraint s d (23) d The difference between the total effect, b v, namely the simple regression coefficient of y on the parent independent variable V, and B v, the effect of V on y through the variables x d, is the indirect effect of V on y [6][2] IV FORESTING onventionally, forecasting using regression methods where the parent independent variables are categorical and represented by codes is carried out using the method of least squares In this case, the value of the dependent variable to be predicted or forecasted, for given values of the independent variables, is obtained by inserting the values of these independent variables represented by codes in the fitted regression model, where the independent variable is coded from to n (n being the number of observations in ascending order of time, or the orthogonal coding system where the earliest in time is coded ( until the latest in time is coded ( ), increasing arithmetically by unit, if the number of observations are odd, example is t = -3,-2,-,,,2,3 if there are 7 observations, alternatively the codes will be -(n-), -(n-3), -(n-5), starting with the earliest in time to the latest in time, if the number of observations is even These codes are used as independent variables against the real dependent variable, y This method suffers from the restriction of equal spacing of levels in the codes using normal dummy variable regression models plus the additional difficulty of interpreting the regression coefficients generated wwwijesiorg 6 P a g e
6 In cases where the parent independent variables are each represented by a set of dummy variables of s and s, the use of direct effects as the forecast model coefficients is advocated These uses of direct effects as coefficient has taken care of the requirement and constraints of equal spacing and are interpretable, hence more useful for practical purposes The multiple regression model expressing the dependence or relationship between the dependent variable Y and the parent independent variables, B, etc represented by their respective sets of ordinal dummy variables is Y i = β +β ; i; + β 2; i2; + + β a-; i,a-; + β ;B i;b + β 2;B i2;b + + β b-;b ib-;b + β ; i; + β 2; i2; + +β c-; ic-; + + e i (2) Where ij; is the ordinal dummy variable representing the jth level of factor with regression effect β j;, j=,2, a-; ij;b is the ordinal dummy variable representing the jth level of factor B with regression effect β j;b, j=,2,,b-; ij; is the ordinal dummy variable representing the jth level of factor with regression effect Β j; ; j=,2, c-; etc, and e i is the error term uncorrelated with the ij s and E(e i ) = for i=,2, n The estimated value of equation (2) is E(y i )=β + a β j ; b c j = E( ij; )+ j = β j ;B E( ij;b )+ j= β j; E( ij ; )+ (25) Now define the regression effect of any parent independent variable say, on the dependent variable y through the effects of the set of ordinal dummy variables representing that parent independent variable We take the partial derivative of equation (25), the expected value of y with respect to obtaining de (yi ) d a de( ij ; ) = j = β j ; d b + j= β j;b de ( ij ;B ) d c + j= β j; de ( ij ; ) d + Where de ( ij ; ) = α j; is the simple regression coefficient of ij; regressing on the parent independent variable,,with so that de (y i ) d d a j; j = (Oyeka,993) and de ( ij ;Z ) =, for all parent independent variables Z ; Z = B,, a = j= β j; J; + d Hence the partial regression effect of the parent independent variable of factor through the effects of its representative set of ordinal dummy variables on the dependent variable y is a β y; = j= j; β j; (26) Whose sample estimate is a j; j= b j; (27) where b j; is the sample estimate of the partial regression coefficients β j;, j=, 2,, a- Estimates of the partial effects of other parent independent variables in the model are similarly obtained V ILLUSTRTIVE EMPLE Mean monthly maximum temperature data, in degree centigrade, was collected by a research centre for five consecutive years (27-2) as shown in table 2 below Table 2: Mean Monthly Maximum Temperature ( o ) YER JN FEB MR PR MY JUN JUL UG SEP OT NOV DE wwwijesiorg 7 P a g e
7 pplying equation 5 to obtain the design matrix, note must be taken that two independent variables are involved here, they are year represented in the dummy design matrix as id, i=,2,3, and months, represented in the dummy design matrix as M id, i=,2,, Note also that the design matrix will be obtained using the cumulative coding system proposed in equation 5 for both the years and the months respectively Table 3 below shows the cumulatively coded design matrix for the year and month variables Note that one year (2) was dropped Note also that one month (Dec) was dropped P m and P x are the parent independent variables for month and year respectively Table3: ummulatively coded design matrix for the maximum temperature data P m P x 2 3 M M 2 M 3 M M 5 M 6 M 7 M 8 M 9 M, M, wwwijesiorg 8 P a g e
8 For the direct effects, the dummy variables (x id ; d=, 2, 3, ) as independent variables, are each regressed on p x, the parent independent variable for year, using the method of least squares Similarly, the dummy variables (m id ;d=,, ) as independent variables, are each regressed on p m, the parent independent variable for month, by the same method of least squares Table below show the outputs of these regressions Table : Simple Regression oefficients i MODEL OEFFIIENT(α i ) 2 3 on p x 2 on p x 3 on p x on p x M on p m M 2 on p m M 3 on p m M on p m M 5 on p m M 6 on p m M 7 on p m M 8 on p m M 9 on p m M, on p m M, on p m Note that the sums of the regression coefficients for each of the variables (year and month) sum up to Recall that the direct effects for the years, B x, is obtained as B x = i= i β i where β i are the regression coefficients obtained from regressing the maximum temperature data (as dependent variable) on the cumulatively coded dummy variable design matrix (table 3 without p x and p m ) and α i are the simple regression coefficients related to the years x, x 2, x 3, x respectively as shown in table above B m = i β i is obtained similarly for months where α i are the simple regression coefficients related to the months m, m 2,, m, respectively as shown in table Table 5 below show the coefficients (β) obtained from the regression wwwijesiorg 9 P a g e
9 Table 5: Regression oefficients from the regression of maximum temperature data on the cumulatively coded dummy variable design matrix i Dummy Variable onstant 2 3 M M 2 M 3 M M 5 M 6 M 7 M 8 M 9 M, M, Regression oefficient (β i ) p-value For the regression above, we have R 2 =877 and MSE = 979 with p-value=, hence a significant multiple regression exist Direct Effects and Test of Significance The direct effect for year, B x = i= i β i, is obtained as 33x -3 Similarly, the direct effect for month is B m = i= i β i, is -257 Testing significance of the direct effect for the years B x, the t- statistic where t = Bx var (Bx) follows the student t distribution with n-p degrees of freedom, p is the number of parameters in the model Var(B ) = Var(α'β i ) = α i 'Var(β i )α i for i=,2,3, and Var(β i ) = MSE( d ' d ) - [] so Var(B x )= α i ' MSE( d ' d ) - α i i=,2,3, this yields Var(B x ) = 67 Hence t x = = 8 For 6 - = 56 degrees of freedom, a p-value of more than 76 was observed; this supports the hypothesis of no significant direct effect, due to the years, on the regression equation Similarly, for a significance test of the direct effect of the months, B m, on the regression equation Var(B m ) is Hence wwwijesiorg 5 P a g e
10 t m = Bm Var (Bm) = 257 = -83 For n-p which is 6- = 9 degrees of freedom, a p-value of less than 5 was observed This supports a significant direct effect due to the months Forecasting The forecast equation for maximum temperature is therefore of the form Ť ym =B O +B x t xi +B m t m i i=,2 2 (28) Where B o is the overall mean effect estimated by the constant term obtained as in table 5 (the value here is 337) B x is the direct effect of year on maximum temperature obtained by the method of path analysis (value here is 33x -3 ), this value is not statistically significant hence it will be dropped B m is the direct effect of months on maximum temperature, also obtained by the method of path analysis (value here is ), this direct effect is statistically significant [7] interpreted the path coefficients (the direct effects) thus: given a path diagram as below θ = 8 >B If region increases by standard deviation from its mean, region B would be expected to increase by 8 its own standard deviations from its own mean while holding all other relevant regional connections constant Our forecast equation here, using the direct effects as coefficients is Ť = t m i i=,2,,2 (29) t m i is the serial count of the months from January (ie t m i =) to December (t m i = 2) of each year If forecast for December 22, say, is required, and t m i = 2, then Ť = (2) = 39 o Similarly, if forecast for February 23 is required, t m i = 2, then Ť = (2) = 3266 o VI ONLUSION So far it has been demonstrated how one can make forecasts on a variable which can be arranged in any sequence, including time-order, using the multiple regression approach and the robust method of dummy variables, with the design matrix obtained by a cumulative coding arrangement The direct effects, obtained through the method of path analysis, are used as parameter estimates, where they are significant This procedure can be viewed as having the ability to break down a single sequence of data into its hitherto not visible components, thus creating a let-in into the bits and pieces that make up the variable, and the relevant pieces reassembled to obtain a forecasting model for the variable REFERENES [] Neter, J; Wasserman, W; Kutner, MH (983): pplied Linear Regression Models Richard D Irwin Inc Illinois [2] Oyeka, I (993): Estimating Effects in Ordinal Dummy Variable Regression STTISTI, anno LIII, n2 PP 26-8 [3] Boyle, R P (97): Path nalysis and Ordinal Data merican Journal of Sociology, 7, 97, 6-8 [] Lyon, M (97): Techniques for Using Ordinal Measures in Regression and Path nalysis, in Herbert ostner (ed), Sociological Methods, Josey Bass Publishers, San Francisco [5] Wikipedia, the free encyclopedia (2), Path nalysis (Statistics) Modified 2 November, 2 [6] Wright, S (96); Path oefficients and Path Regression: lternative to omplementary oncept : Biometrics, Volume 6, pp [7] Brian, P (2): Structural Equation Modelling (SEM) or Path nalysis http//afninihgov wwwijesiorg 5 P a g e
TwoFactorAnalysisofVarianceandDummyVariableMultipleRegressionModels
Gloal Journal of Science Frontier Research: F Mathematics and Decision Sciences Volume 4 Issue 6 Version.0 Year 04 Type : Doule lind Peer Reviewed International Research Journal Pulisher: Gloal Journals
More informationRegression Analysis II
Regression Analysis II Measures of Goodness of fit Two measures of Goodness of fit Measure of the absolute fit of the sample points to the sample regression line Standard error of the estimate An index
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationChapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression
Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationChapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression
BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between
More informationChapter 4. Regression Models. Learning Objectives
Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing
More informationThe Multiple Regression Model
Multiple Regression The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & or more independent variables (X i ) Multiple Regression Model with k Independent Variables:
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationNATCOR Regression Modelling for Time Series
Universität Hamburg Institut für Wirtschaftsinformatik Prof. Dr. D.B. Preßmar Professor Robert Fildes NATCOR Regression Modelling for Time Series The material presented has been developed with the substantial
More informationChapter 14 Simple Linear Regression (A)
Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationChapter 14. Linear least squares
Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given
More information5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares
5.1 Model Specification and Data 5. Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares Estimator 5.4 Interval Estimation 5.5 Hypothesis Testing for
More informationBasic Business Statistics, 10/e
Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationMathematics for Economics MA course
Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between
More informationMultivariate Regression (Chapter 10)
Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate
More informationEconometrics. 4) Statistical inference
30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationCS 5014: Research Methods in Computer Science
Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and
More informationAnalysis of Variance
Analysis of Variance Math 36b May 7, 2009 Contents 2 ANOVA: Analysis of Variance 16 2.1 Basic ANOVA........................... 16 2.1.1 the model......................... 17 2.1.2 treatment sum of squares.................
More informationSTATISTICS FOR ECONOMISTS: A BEGINNING. John E. Floyd University of Toronto
STATISTICS FOR ECONOMISTS: A BEGINNING John E. Floyd University of Toronto July 2, 2010 PREFACE The pages that follow contain the material presented in my introductory quantitative methods in economics
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationFinQuiz Notes
Reading 10 Multiple Regression and Issues in Regression Analysis 2. MULTIPLE LINEAR REGRESSION Multiple linear regression is a method used to model the linear relationship between a dependent variable
More informationEXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"
EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically
More informationExam details. Final Review Session. Things to Review
Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit
More informationChapter 13. Multiple Regression and Model Building
Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the
More informationHypothesis Testing hypothesis testing approach
Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationEcon 3790: Business and Economics Statistics. Instructor: Yogesh Uppal
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal yuppal@ysu.edu Sampling Distribution of b 1 Expected value of b 1 : Variance of b 1 : E(b 1 ) = 1 Var(b 1 ) = σ 2 /SS x Estimate of
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationWe like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.
Statistical Methods in Business Lecture 5. Linear Regression We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.
More informationSimple Linear Regression
9-1 l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical Method for Determining Regression 9.4 Least Square Method 9.5 Correlation Coefficient and Coefficient
More informationSTAT Chapter 11: Regression
STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship
More information16.3 One-Way ANOVA: The Procedure
16.3 One-Way ANOVA: The Procedure Tom Lewis Fall Term 2009 Tom Lewis () 16.3 One-Way ANOVA: The Procedure Fall Term 2009 1 / 10 Outline 1 The background 2 Computing formulas 3 The ANOVA Identity 4 Tom
More information6. Multiple Linear Regression
6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X
More informationSummary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)
Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ
More informationRegression used to predict or estimate the value of one variable corresponding to a given value of another variable.
CHAPTER 9 Simple Linear Regression and Correlation Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. X = independent variable. Y = dependent
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationSTA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007
STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.
More informationInference for Regression Simple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating
More informationChapter 8 Heteroskedasticity
Chapter 8 Walter R. Paczkowski Rutgers University Page 1 Chapter Contents 8.1 The Nature of 8. Detecting 8.3 -Consistent Standard Errors 8.4 Generalized Least Squares: Known Form of Variance 8.5 Generalized
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,
More informationAnalysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total
Math 221: Linear Regression and Prediction Intervals S. K. Hyde Chapter 23 (Moore, 5th Ed.) (Neter, Kutner, Nachsheim, and Wasserman) The Toluca Company manufactures refrigeration equipment as well as
More information11 Hypothesis Testing
28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationLECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit
LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define
More informationSTAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)
STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationInference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationDiscrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test
Discrete distribution Fitting probability models to frequency data A probability distribution describing a discrete numerical random variable For example,! Number of heads from 10 flips of a coin! Number
More informationChapter 7 Student Lecture Notes 7-1
Chapter 7 Student Lecture Notes 7- Chapter Goals QM353: Business Statistics Chapter 7 Multiple Regression Analysis and Model Building After completing this chapter, you should be able to: Explain model
More informationStatistics for Managers using Microsoft Excel 6 th Edition
Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of
More informationChapter 16. Simple Linear Regression and Correlation
Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationSTAT 212 Business Statistics II 1
STAT 1 Business Statistics II 1 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICAL SCIENCES DHAHRAN, SAUDI ARABIA STAT 1: BUSINESS STATISTICS II Semester 091 Final Exam Thursday Feb
More informationNon-Parametric Two-Sample Analysis: The Mann-Whitney U Test
Non-Parametric Two-Sample Analysis: The Mann-Whitney U Test When samples do not meet the assumption of normality parametric tests should not be used. To overcome this problem, non-parametric tests can
More informationTesting Linear Restrictions: cont.
Testing Linear Restrictions: cont. The F-statistic is closely connected with the R of the regression. In fact, if we are testing q linear restriction, can write the F-stastic as F = (R u R r)=q ( R u)=(n
More informationT-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum
T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationMultiple comparisons - subsequent inferences for two-way ANOVA
1 Multiple comparisons - subsequent inferences for two-way ANOVA the kinds of inferences to be made after the F tests of a two-way ANOVA depend on the results if none of the F tests lead to rejection of
More informationMultiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company
Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationThe legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.
1 Chapter 1: Research Design Principles The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization. 2 Chapter 2: Completely Randomized Design
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationOne-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups
One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The
More informationFactorial designs. Experiments
Chapter 5: Factorial designs Petter Mostad mostad@chalmers.se Experiments Actively making changes and observing the result, to find causal relationships. Many types of experimental plans Measuring response
More information22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationSIMPLE REGRESSION ANALYSIS. Business Statistics
SIMPLE REGRESSION ANALYSIS Business Statistics CONTENTS Ordinary least squares (recap for some) Statistical formulation of the regression model Assessing the regression model Testing the regression coefficients
More informationLecture 9 SLR in Matrix Form
Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.
More informationSTA 2101/442 Assignment Four 1
STA 2101/442 Assignment Four 1 One version of the general linear model with fixed effects is y = Xβ + ɛ, where X is an n p matrix of known constants with n > p and the columns of X linearly independent.
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationCHAPTER EIGHT Linear Regression
7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationSTA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.
STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationY i = η + ɛ i, i = 1,...,n.
Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics.
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationSTK4900/ Lecture 3. Program
STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies
More informationAnalysis of Variance (ANOVA)
Analysis of Variance ANOVA) Compare several means Radu Trîmbiţaş 1 Analysis of Variance for a One-Way Layout 1.1 One-way ANOVA Analysis of Variance for a One-Way Layout procedure for one-way layout Suppose
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationTopic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model
Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More information17: INFERENCE FOR MULTIPLE REGRESSION. Inference for Individual Regression Coefficients
17: INFERENCE FOR MULTIPLE REGRESSION Inference for Individual Regression Coefficients The results of this section require the assumption that the errors u are normally distributed. Let c i ij denote the
More information