Chapter 8 Indicator Variables

Similar documents
LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Chapter 11: Simple Linear Regression and Correlation

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Chapter 15 - Multiple Regression

Lecture 6: Introduction to Linear Regression

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Chapter 13: Multiple Regression

Linear Regression Analysis: Terminology and Notation

Chapter 12 Analysis of Covariance

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

x i1 =1 for all i (the constant ).

Statistics for Economics & Business

Statistics for Business and Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

x = , so that calculated

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Polynomial Regression Models

Correlation and Regression

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

18. SIMPLE LINEAR REGRESSION III

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

28. SIMPLE LINEAR REGRESSION III

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Learning Objectives for Chapter 11

STAT 3008 Applied Regression Analysis

STAT 511 FINAL EXAM NAME Spring 2001

Comparison of Regression Lines

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

January Examinations 2015

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

ANOVA. The Observations y ij

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Chapter 14 Simple Linear Regression

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Chapter 9: Statistical Inference and the Relationship between Two Variables

Dummy variables in multiple variable regression model

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Basic Business Statistics, 10/e

Statistics MINITAB - Lab 2

SIMPLE LINEAR REGRESSION

Lecture 3 Stat102, Spring 2007

The Ordinary Least Squares (OLS) Estimator

Economics 130. Lecture 4 Simple Linear Regression Continued

PROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE

Properties of Least Squares

Chapter 15 Student Lecture Notes 15-1

Statistics II Final Exam 26/6/18

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

This column is a continuation of our previous column

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Primer on High-Order Moment Estimators

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Lecture 4 Hypothesis Testing

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Kernel Methods and SVMs Extension

/ n ) are compared. The logic is: if the two

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

e i is a random error

Lecture 6 More on Complete Randomized Block Design (RBD)

Topic- 11 The Analysis of Variance

PBAF 528 Week Theory Is the variable s place in the equation certain and theoretically sound? Most important! 2. T-test

17 Nested and Higher Order Designs

9. Binary Dependent Variables

β0 + β1xi and want to estimate the unknown

Introduction to Regression

Chapter 5 Multilevel Models

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

Notes on Frequency Estimation in Data Streams

The Geometry of Logit and Probit

Homework Assignment 3 Due in class, Thursday October 15

STATISTICS QUESTIONS. Step by Step Solutions.

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

Composite Hypotheses testing

REGRESSION ANALYSIS II- MULTICOLLINEARITY

Chapter 6. Supplemental Text Material

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Topic 23 - Randomized Complete Block Designs (RCBD)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Transcription:

Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n e sense at ey are recorded on a well defned scale. In many applcatons, e varables can not be defned on a well defned scale and ey are qualtatve n nature. For example, e varables lke sex (male or female), colour (black, whte), natonalty, employment status (employed, unemployed) are defned on a nomnal scale. Such varables do not have any natural scale of measurement. Such varables usually ndcate e presence or absence of a qualty or an attrbute lke employed or unemployed, graduate or non-graduate, smokers or non- smokers, yes or no, acceptance or rejecton, so ey are defned on a nomnal scale. Such varables can be quantfed by artfcally constructng e varables at takes e values, e.g., and where ndcates usually e presence of attrbute and ndcates usually e absence of attrbute. For example, ndcator at e person s male and ndcates at e person s female. Smlarly, may ndcate at e person s employed and en ndcates at e person s unemployed. Such varables classfy e data nto mutually exclusve categores. These varables are called ndcator varable or dummy varables. Usually, e ndcator varables take on e values and to dentfy e mutually exclusve classes of e explanatory varables. For example, f person s male = f person s female, f person s employed = f person s unemployed. Here we use e notaton n place of X to denote e dummy varable. The choce of and to dentfy a category s arbtrary. For example, one can also defne e dummy varable n above examples as Regresson Analyss Chapter 8 Indcator Varables Shalabh, IIT Kanpur

f person s female = f person s male, f person s unemployed = f person s employed. It s also not necessary to choose only and to denote e category. In fact, any dstnct value of wll serve e purpose. The choces of and are preferred as ey make e calculatons smple, help n easy nterpretaton of e values and usually turn out to be a satsfactory choce. In a gven regresson model, e qualtatve and quanttatve can also occur togeer,.e., some varables are qualtatve and oers are quanttatve. When all explanatory varables are - quanttatve, en e model s called as a regresson model, - qualtatve, en e model s called as an analyss of varance model and - quanttatve and qualtatve bo, en e model s called as a analyss of covarance model. Such models can be dealt wn e framework of regresson analyss. The usual tools of regresson analyss can be used n case of dummy varables. Example: Consder e followng model w x as quanttatve and as ndcator varable y = β+ βx+ β + ε, E( ε) =, Var( ε) = σ f an observaton belongs to group A = f an observaton belongs to group B. The nterpretaton of result s mportant. We proceed as follows: If =, en y = β+ βx+ β. + ε = β+ βx+ ε E( y/ = ) = β + β x whch s a straght lne relatonshp w ntercept β and slope β. Regresson Analyss Chapter 8 Indcator Varables Shalabh, IIT Kanpur

If =, en y = β + β x + β.+ ε = ( β + β ) + β x + ε E( y/ = ) = ( β + β ) + β x whch s a straght lne relatonshp w ntercept ( β + β) and slope β. The quanttes E( y/ = ) and E( y/ = ) are e average responses when an observaton belongs to group A and group B, respectvely. Thus β = E( y/ = ) E( y/ = ) whch has an nterpretaton as e dfference between e average values of y w = and =. Graphcally, t looks lke as n e followng fgure. It descrbes two parallel regresson lnes w same varances σ. If ere are ree explanatory varables n e model w two ndcator varables and 3 en ey wll descrbe ree levels, e.g., groups AB, and C. The levels of ndcator varables are as follows:. 3 =, = f e observaton s from group A. 3 =, = f e observaton s from group B 3. 3 =, = f e observaton s from group C The concerned regresson model s y = β + β x + β + β + ε E ε = ε = σ 3 3, ( ), var( ). Regresson Analyss Chapter 8 Indcator Varables Shalabh, IIT Kanpur 3

In general, f a qualtatve varable has m levels, en ( m ) ndcator varables are requred and each of em takes value and. Consder e followng examples to understand how to defne such ndcator varables and how ey can be handled. Example: Suppose y denotes e monly salary of a person and denotes wheer e person s graduate or nongraduate. The model s y = β + β + ε E ε = ε = σ, ( ), var( ). W n observatons, e model s y = β + β + ε, =,,..., n E( y / = ) = β E( y / = ) = β + β β = E( y / = ) E( y / = ) Thus - β measures e mean salary of a non-graduate. - β measures e dfference n e mean salares of a graduate and non-graduate person. Now consder e same model w two ndcator varables defned n e followng way: f person s graduate = f person s nongraduate, f person s nongraduate = f person s graduate. The model w n observatons s Then we have y = β + β + β + ε E ε = Var ε = σ = n, ( ), ( ),,,...,. E y / =, = = β + β : Average salary of non-graduate. [ ] E y / =, = = β + β : Average salary of graduate. [ ] 3. [ /, ] E y = = = β : cannot exst E y / =, = = β + β + β : cannot exst. 4. [ ] Regresson Analyss Chapter 8 Indcator Varables Shalabh, IIT Kanpur 4

Notce at n s case + = for all whch s an exact constrant and ndcates e contradcton as follows: + = person s graduate + = person s non-graduate So multcollnearty s present n such cases. Hence e rank of matrx of explanatory varables falls short by. So β, β and β are ndetermnate and least squares meod breaks down. So e proposton of ntroducng two ndcator varables s useful but ey lead to serous consequences. Ths s known as dummy varable trap. If e ntercept term s gnored, en e model becomes en y E Var n = β + β + ε, ( ε) =, ( ε) = σ, =,,..., E( y / =, = ) = β Average salary of a graduate. E( y / =, = ) = β Average salary of a non graduate. So when ntercept term s dropped, en β and β have proper nterpretatons as e average salares of a graduate and non-graduate persons, respectvely. Now e parameters can be estmated usng ordnary least squares prncple and standard procedures for drawng nferences can be used. Rule: When e explanatory varable leads to m mutually exclusve categores classfcaton, en use ( m ) ndcator varables for ts representaton. Alternatvely, use m ndcator varables but drop e ntercept term. Regresson Analyss Chapter 8 Indcator Varables Shalabh, IIT Kanpur 5

Interacton term: Suppose a model has two explanatory varables one quanttatve varable and oer an ndcator varable. Suppose bo nteract and an explanatory varable as e nteracton of em s added to e model. y x x E Var n = β + β + β + β3 + ε, ( ε) =, ( ε) = σ, =,,...,. To nterpret e model parameters, we proceed as follows: Suppose e ndcator varables are gven by f person belongs to group A = f person belongs to group B Then y = salary of ( ) person. E y / = = β + βx + β. + β3x. = β + β x. Ths s a straght lne w ntercept β and slope β. Next ( ) E y / = = β + βx + β. + β3x. = ( β + β ) + ( β + β ) x. 3 Ths s a straght lne w ntercept term ( β + β) and slope ( β+ β3). The model Ey ( ) = β + β x + β + β x 3 has dfferent slopes and dfferent ntercept terms. Thus β reflects e change n ntercept term assocated w e change n e group of person.e., when group changes from A to B. β 3 reflects e change n slope assocated w e change n e group of person,.e., when group changes from A to B. Fttng of e model y= β + β x + β + β x + ε 3 s equvalent to fttng two separate regresson models correspondng to = and =,.e. Regresson Analyss Chapter 8 Indcator Varables Shalabh, IIT Kanpur 6

and y = β + β x + β. + β x.+ ε 3 y= ( β + β ) + ( β + β ) x + ε 3 y = β + β x + β. + β x. + ε y respectvely. 3 = β + β x + ε The test of hypoess becomes convenent by usng an ndcator varable. For example, f we want to test wheer e two regresson models are dentcal, e test of hypoess nvolves testng H : β = β = 3 H : β and/or β. 3 Acceptance of H ndcates at only sngle model s necessary to explan e relatonshp. In anoer example, f e objectve s to test at e two models dffer w respect to ntercepts only and ey have same slopes, en e test of hypoess nvolves testng H : β3 = H : β. 3 Indcator varables versus quanttatve explanatory varable The quanttatve explanatory varables can be converted nto ndcator varables. For example, f e ages of persons are grouped as follows: Group : day to 3 years Group : 3 years to 8 years Group 3: 8 years to years Group 4: years to 7 years Group 5: 7 years to 5 years en e varable age can be represented by four dfferent ndcator varables. Snce t s dffcult to collect e data on ndvdual ages, so s wll help s easy collecton of data. A dsadvantage s at some loss of nformaton occurs. For example, f e ages n years are, 3, 4, 5, 6, 7 and suppose e ndcator varable s defned as Regresson Analyss Chapter 8 Indcator Varables Shalabh, IIT Kanpur 7

f age of person s > 5 years = f age of person s 5 years. Then ese values become,,,,,. Now lookng at e value, one can not determne f t corresponds to age 5, 6 or 7 years. Moreover, f a quanttatve explanatory varable s grouped nto m categores, en ( m ) parameters are requred whereas f e orgnal varable s used as such, en only one parameter s requred. Treatng a quanttatve varable as qualtatve varable ncreases e complexty of e model. The degrees of freedom for error are also reduced. Ths can effect e nferences f data set s small. In large data sets, such effect may be small. The use of ndcator varables does not requre any assumpton about e functonal form of e relatonshp between study and explanatory varables. Regresson analyss and analyss of varance The analyss of varance s oftenly used n analyzng e data from e desgned experments. There s a connecton between e statstcal tools used n analyss of varance and regresson analyss. We consder e case of analyss of varance n one way classfcaton and establsh ts relaton w regresson analyss. One way classfcaton: Let ere are k samples each of sze n from k normally dstrbuted populatons N µ σ = k The (, ),,,...,. populaton dffer only n er means but ey have same varance yj = µ + εj, =,,..., k; j =,,..., n = µ + ( µ µ ) + εj = µ + τ + ε j σ. Ths can be expressed as where y j s e j observaton for e fxed treatment effect τ = µ µ or factor level, µ s e general mean effect, ε j are dentcally and ndependently dstrbuted random errors followng N(, σ ). Regresson Analyss Chapter 8 Indcator Varables Shalabh, IIT Kanpur 8

Note at k τ = µ µ, τ =. = The null hypoess s H : τ = τ =... = τk = H : τ for atleast one. Employng meod of least squares, we obtan e estmator of µ and τ as follows ( y ) k n k n ε j j µ τ = j= = j= k n S = = S = ˆ µ = yj = y µ nk = j= n S = ˆ τ = y ˆ µ = y y τ j n j= where y n = yj. n j = Based on s, e correspondng test statstc s F n k ( y y) k = = k n ( yj y ) = j= kn ( ) whch follows F -dstrbuton w k and kn ( ) degrees of freedom when null hypoess s true. The decson rule s to reject H whenever F Fα ( k, kn ( )) and t s concluded at e k treatment means are not dentcal. Connecton w regresson: To llustrate e connecton between fxed effect one way analyss of varance and regresson, suppose ere are 3 treatments so at e model becomes y = µ + τ + ε, =,,...,3, j =,,..., n. j j There are 3 treatments whch are e ree levels of a qualtatve factor. For example, e temperature can have ree possble levels low, medum and hgh. They can be represented by two ndcator varables as Regresson Analyss Chapter 8 Indcator Varables Shalabh, IIT Kanpur 9

f e observaton s from treatment = oerwse, f e observaton s from treatment =. oerwse. The regresson model can be rewrtten as yj = β + βj + β j + εj, =,,3; j =,,..., n where st : value of for j observaton w treatment j nd : value of for j observaton w treatment. j Note at - parameters n regresson model are β, β, β. - parameters n analyss of varance model are µτ,, τ, τ 3. We establsh a relatonshp between e two sets of parameters. Suppose treatment s used on j observaton, so j j =, = and y = β + β. + β. + ε j j = β + β + ε. j In case of analyss of varance model, s s represented as y = µ + τ + ε j j = µ + ε where µ = µ + τ j β + β = µ. If treatment s appled on - n regresson model set up, =, = and j j y = β + β. + β.+ ε = β + β + ε j j j j observaton, en - n analyss of varance model set up, Regresson Analyss Chapter 8 Indcator Varables Shalabh, IIT Kanpur

y = µ + τ + ε j j = µ + ε where µ = µ + τ j β + β = µ. When treatment 3 s used on - n regresson model set up, = = j j y = β + β + β + ε 3j.. 3 j = β + ε 3j j observaton, en - n analyss of varance model set up y = µ + τ + ε 3j 3 3j = µ + ε where µ = µ + τ 3 3j 3 3 β = µ. 3 So fnally, ere are followng ree relatonshps β + β = µ β + β = µ β = µ 3 β = µ 3 β = µ µ β µ µ =. 3 In general, f ere are k treatments, en ( k ) ndcator varables are needed. The regresson model s gven by where y = β + β + β +... + β + ε, =,,..., k; j =,,..., n j j j k k, j j j f j observaton gets treatment = oerwse. In s case, e relatonshp s β = µ k β = µ µ k, =,,..., k. So β always estmates e mean of k treatment and β estmates e dfferences between e means of treatment and k treatment. Regresson Analyss Chapter 8 Indcator Varables Shalabh, IIT Kanpur