LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

LINEAR REGRESSION ANALYSIS MODULE VIII Lecture - 7 Indcator Varables Dr. Shalabh Department of Maematcs and Statstcs Indan Insttute of Technology Kanpur

Indcator varables versus quanttatve explanatory varable The quanttatve explanatory varables can be converted nto ndcator varables. For example, f e ages of persons are grouped as follows: Group : day to years Group : years to 8 years Group : 8 years to years Group 4: years to 7 years Group 5: 7 years to 5 years en e varable age can be represented by four dfferent ndcator varables. Snce t s dffcult to collect e data on ndvdual ages, so s wll help n easy collecton of data. A dsadvantage s at some loss of nformaton occurs. For example, f e ages n years are,, 4, 5, 6, 7 and suppose e ndcator varable s defned as f age of person s > 5 years D = 0 f age of person s 5 years. Then ese values become 0, 0, 0,,,. Now lookng at e value, one can not determne f t corresponds to age 5, 6 or 7 years.

Moreover, f a quanttatve explanatory varable s grouped nto m categores, en (m -) parameters are requred whereas f e orgnal varable s used as such, en only one parameter s requred. Treatng a quanttatve varable as qualtatve varable ncreases e complexty of e model. The degrees of freedom for error are also reduced. Ths can effect e nferences f data set s small. In large data sets, such effect may be small. The use of ndcator varables does not requre any assumpton about e functonal form of e relatonshp between study and explanatory varables.

4 Regresson analyss and analyss of varance The analyss of varance s usually used n analyzng e data from e desgned experments. There s a connecton between e statstcal tools used n analyss of varance and regresson analyss. We consder e case of analyss of varance n one way classfcaton and establsh ts relaton w regresson analyss. One way classfcaton Let ere are k samples each of sze n from k normally dstrbuted populatons only n er means but ey have same varance y = µ + ε, =,,..., k; j =,,..., n j j = µ + ( µ µ ) + ε = µ + τ + ε j j σ. Ths can be expressed as N µ σ = k (, ),,,...,. The populaton dffer where y j s e j observaton for e fxed treatment effect τ = µ µ or factor level, µ s e general mean effect, j are dentcally and ndependently dstrbuted random errors followng N(0, σ ). Note at k τ = µ µ, τ = 0. = The null hypoess s H : τ = τ =... = τ = 0 H 0 : τ 0 for atleast one. k ε

5 µ Employng meod of least squares, we obtan e estmator of and as follows: τ S ( y ) k n k n j j = j= = j= = ε = µ τ k n S = 0 ˆ µ = yj = µ nk = j= y where y n = yj. n j = n S = 0 ˆ τ = y ˆ µ = y y τ j n j= Based on s, e correspondng test statstc s F 0 n k ( y y) k = = k n ( yj y ) = j= kn ( ) whch follows F-dstrbuton w k - and k (n - ) degrees of freedom when null hypoess s true. The decson rule s to reject H 0 whenever F0 Fα ( k, kn ( )) and t s concluded at e k treatment means are not dentcal.

6 Connecton w regresson To llustrate e connecton between fxed effect one way analyss of varance and regresson, suppose ere are treatments so at e model becomes y = µ + τ + ε, =,,...,, j =,,..., n. j j There are treatments whch are e ree levels of a qualtatve factor. For example, e temperature can have ree possble levels low, medum and hgh. They can be represented by two ndcator varables as f e observaton s from treatment D = 0 oerwse, D f e observaton s from treatment =. 0 oerwse. The regresson model can be rewrtten as where st D : value of D for j observaton w treatment j nd D : value of D for j observaton w treatment. j yj = β0 + βdj + βd j + εj, =,,; j =,,..., n Note at parameters n regresson model are β0, β, β. parameters n analyss of varance model are µτ,, τ, τ. We establsh a relatonshp between e two sets of parameters.

7 Suppose treatment s used on j observaton, so D j =, D j = 0 and y = β + β. + β.0 + ε j 0 j = β + β + ε. 0 j In case of analyss of varance model, s s represented as y = µ + τ + ε j j = µ + ε where µ = µ + τ j β + β = µ 0. If treatment s appled on j observaton, en - n regresson model set up, D = 0, D = j j and y = β + β.0 + β.+ ε j 0 j = β + β + ε 0 j. - n analyss of varance model set up, y = µ + τ + ε j j = µ + ε where µ = µ + τ j β + β = µ 0.

When treatment s used on j observaton, en - n regresson model set up, D = D = 0 j j y = β + β.0 + β.0 + ε j 0 = β + ε 0 j. j 8 - n analyss of varance model set up y = µ + τ + ε j j = µ + ε j where µ = µ + τ β. 0 = µ So fnally, ere are followng ree relatonshps β + β = µ 0 β + β = µ 0 β = µ 0 β = µ 0 β = µ µ β µ µ =.

9 In general, f ere are k treatments, en (k - ) ndcator varables are needed. The regresson model s gven by y = β + β D + β D +... + β D + ε, =,,..., k; j =,,..., n j 0 j j k k, j j where D j f j observaton gets treatment = 0 oerwse. In s case, e relatonshp s β β = µ So always estmates e mean of k treatment and estmates e dfferences between e means of treatment 0 and k treatment. 0 k β = µ µ k, =,,..., k. β