The data can be downloaded as an Excel file under Econ2130 at

1 HG Revised Sept. 018 Supplement to lecture 9 (Tuesda 18 Sept) On the bivariate normal model Eample: daughter s height (Y) vs. mother s height (). Data collected on Econ 130 lectures 010-01. The data can be downloaded as an Ecel file under Econ130 at http://folk.uio.no/haraldg/. n =15 observation pairs: ( 1, 1),(, ),,( n, n) of the random pair (, Y ) Figure 1 Scatter plot Data summar: Mothers: s 1 1 s Daughters: 167.6, ˆ s 5.5938 n n 166.9, ˆ 5.83 i, ( i ) sample variance n i1 n1 i1 Correlation: n 1 r ˆ s ( ss ) 0.36 s ( i )( i ) sample covariance n 1 i1 Population: all mother daughter pairs in Norwa (with daughters at least 18 ears (sa))

Model: The bivariate normal distribution ( population distribution) If (, Y) ~ N(,,,, ) (bivariate normall distributed with 5 parameters (population quantities) E( ), E( Y), SD( ), SD( Y), corre(, Y) ), the joint pdf is (see Rice p. 81) 1 1 ( ) (1) f (, ) ep 1 for and. ( ) ( )( ) (1 ) Fitting to data. It can be shown that (practicall) the best fitting (in the maimum likelihood sense) estimate of this pdf is obtained b substituting (,, s, s, r ) for the parameters (,,,, ). The result is shown in the following contour plot produced b Maple 1 : Note that maimum of the estimated pdf is obtained in the point (, ) (166.9, 167.6), which is the sample estimate of the population means (, ). Figure Contour plot of the best fitting joint normal distribution 1 It appears that Stata cannot produce contour plots like this (as far as I know)

3 Important properties of the bivariate normal distribution Propert 1 The distribution is smmetric in all directions with highest concentration of observations at the center. The contours (i.e., where the pdf is constant) are ellipses. If the scatter plot of observations of (,Y) does not show smmetr of this kind, the bivariate normal model is not realistic. Propert If (, Y ) ~ bivariate (joint) normal ( N(,,,, )), the marginal distributions are both normal, For eample, N Y N. ~ (, ), and ~ (, ) 1 1 f ( ) f (, ) d see Rice p.8 (optional reading) e 1 1 fy ( ) f (, ) d similarl e Note. The other wa does not hold! Even if,y are both normall distributed marginall, the joint distribution is not necessaril bivariate normal (see e.g., Rice p. 84). Propert 3 If (,Y) is bivariate normal and the correlation is zero ( 0), then and Y are (stochasticall) independent! Proof. If 0, the general epression (1) reduces to 1 1 ( ) ( ) f (, ) ep f ( ) ( ) fy which implies that and Y are independent. (End of proof) Note. This is a special feature of the joint normal distribution. In general, zero correlation does not impl that and Y are independent! So, zero correlation can, in general, be looked upon as a weaker form of lack of dependence between and Y than stochastic independence (which is the strongest form). Propert 4 If (,Y) is bivariate normal, both regressions (Y w.r.t and w.r.t. Y) are automaticall linear and homoscedastic. In addition the two conditional distributions are both normal.

4 Having the joint pdf, f (, ), and the marginal one, f ( ), we can calculate the conditional pdf for Y : f (, ) f ( ) mess algebra (can be skipped) f ( ) () 1 ( ) 1 (1 ) e 1 which we recognize as a normal pdf with epectation E( Y ) ( ), and variance var( Y ) (1 ). In particular, the conditional distribution of Y with regression function (cf. Rice p. 148) is normal N E( Y ), var( Y ), ( ) E( Y ) ( ), (i.e., a linear function of ) and variance function ( ) (1 ) (i.e., constant) var( Y ) Important result. Hence, if we can assume reasonabl that the joint distribution of and Y is bivariate normal, it follows automaticall (without etra assumptions) that the regression of Y w.r.t. is linear and homoscedastic. Substituting the estimates we have for (,,,, ), in ( ) and ( ), we get the estimated regression (which, in fact, are the same as the OLS estimates from the basic course) 5.5938 ˆ( ) 167.6 (0.36) ( 166.9) 109.86 (0. 3458) 5.83 ˆ ( ) (5.5938) (1 (0.36) ) (5.187) and the estimated conditional distribution is normal: ( Y ) ~ N 109.86 (0.3458), (5.187) (estimated)

5 Figure 3 Contour plot of the best fitting joint normal distribution and the implied (OLS estimated) regression function. Historical note on the term regression. The term was used b geneticists in the beginning of last centur. The observed a phenomenon that when a parent has height (sa) awa from the average height in the population, the offspring tends to have height closer to the average height (i.e., a regression towards the mean). This tendenc was confirmed b regression analsis. Illustration: Our estimated population mean (for mothers) is ˆ 166.9. Consider a mother with height 17cm (5 cm above population mean). The mean height for daughters of such mothers is estimated as mean) ˆ(17) 109.86 (0.3458) 17 169.4 (i.e..5 cm from the population A mother being 16cm gives mean) ˆ(16) 109.86 (0.3458) 16 165.9 (i.e. 1.9 cm from the population

6 Eample of zero correlation when and Y are dependent. Let U, be independent rv s where ~ N(0,1) and U ~ uniform[ 0.5, 0.5]. Let Y U 100 simulated observations of ( Y, ), generated and plotted b Stata are. set obs 100 obs was 0, now 100. gen u=runiform()-.5 // runiform() generates uniform over (0, 1). gen =rnormal() // rnormal() generates N(0,1) observations. gen =^+u. scatter 0 4 6 8-3 - -1 0 1 The plot shows strong dependence between and Y. Regression of Y w.r.t. : fied ( ) E( Y ) E U E U E( U ) 1 but U, independent E( U ) E( U) 0 and var( U ) var( U) 1 (remember that independence f ( u ) f ( u) ). Hence the regression function becomes U ( ) E( Y ) E( U )

7 Scatterplot with regression function Stata command: twowa (scatter ) (function =^, range( -3 )) 0 4 6 8 10-3 - -1 0 1 E(Y ) fied 1 var( Y ) var( U ) var( U ) var( U ) var( U) 1 showing that the variance function is constant, ( ) var( Y ) 1 1. So the relation is non-linear (for the regression function) and homoscedastic. However, the correlation between and Y is zero (in spite of strong dependence)! Proof: def cov( Y, ) The correlation coefficient is ( Y, ), where the covariance becomes SD( ) SD( Y ) def cov(, Y ) E ( E( ))( Y E( Y )) E( Y ) E( ) E( Y ) E( Y ) since ~ N(0,1) E( ) 0 Now U, independent 3 3 3 3 E( Y ) E( U) E( ) E( U ) E( ) E( ) E( U) E( ) We have 3 3 1 E( ) e d g( ) d, where we have called the integrand ( ) g. We see that g ( ) is smmetric about the origin (i.e., g( ) g( ) for all.

8 impling that the area A B, and therefore, the total area g( ) d A B 0. Hence, 3 cov(, Y ) E( ) 0, and, therefore, also the correlation = 0. (End of proof). [Proof that E( U) ( E )( EU) when and U are independent: Suppose ( U, ) has joint pdf f (, u ) and marginal pdfs f ( ) and fu( u ). Independence implies f (, u) f ( ) f ( u). U To find E( U ), we need the rule given in Theorem B (Rice p. 13), that implies that, if h(, u ) is an arbitrar function, then E h(, U) h(, u) f (, u) ddu (whenever the integral eists) Hence E( U ) u f (, u) ddu u f ( ) fu ( u) ddu ufu ( u) f ( ) ddu E( ) constant uf ( u) E( ) du E( ) uf ( u) du E( ) E( U) U (End of proof) ] U