Part 1 of the text covers regression analysis with cross-sectional data. It builds

Size: px
Start display at page:

Download "Part 1 of the text covers regression analysis with cross-sectional data. It builds"

Transcription

1 Regressio Aalysis with Cross-Sectioal Data 1 Part 1 of the text covers regressio aalysis with cross-sectioal data. It builds upo a solid base of college algebra ad basic cocepts i probability ad statistics. Appedices A, B, ad C cotai complete reviews of these topics. Chapter 2 begis with the simple liear regressio model, where we explai oe variable i terms of aother variable. Although simple regressio is ot widely used i applied ecoometrics, it is used occasioally ad serves as a atural startig poit because the algebra ad iterpretatios are relatively straightforward. Chapters 3 ad 4 cover the fudametals of multiple regressio aalysis, where we allow more tha oe variable to affect the variable we are tryig to explai. Multiple regressio is still the most commoly used method i empirical research, ad so these chapters deserve careful attetio. Chapter 3 focuses o the algebra of the method of ordiary least squares (OLS), while also establishig coditios uder which the OLS estimator is ubiased ad best liear ubiased. Chapter 4 covers the importat topic of statistical iferece. Chapter 5 discusses the large sample, or asymptotic, properties of the OLS estimators. This provides justificatio of the iferece procedures i Chapter 4 whe the errors i a regressio model are ot ormally distributed. Chapter 6 covers some additioal topics i regressio aalysis, icludig advaced fuctioal form issues, data scalig, predictio, ad goodess-of-fit. Chapter 7 explais how qualitative iformatio ca be icorporated ito multiple regressio models. Chapter 8 illustrates how to test for ad correct the problem of heteroskedasticity, or ocostat variace, i the error terms. We show how the usual OLS statistics ca be adjusted, ad we also preset a extesio of OLS, kow as weighted least squares, that explicitly accouts for differet variaces i the errors. Chapter 9 delves further ito the very importat problem of correlatio betwee the error term ad oe or more of the explaatory variables. We demostrate how the availability of a proxy variable ca solve the omitted variables problem. I additio, we establish the bias ad icosistecy i the OLS estimators i the presece of certai kids of measuremet errors i the variables. Various data problems are also discussed, icludig the problem of outliers. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it

2 2 The Simple Regressio Model The simple regressio model ca be used to study the relatioship betwee two variables. For reasos we will see, the simple regressio model has limitatios as a geeral tool for empirical aalysis. Nevertheless, it is sometimes appropriate as a empirical tool. Learig how to iterpret the simple regressio model is good practice for studyig multiple regressio, which we will do i subsequet chapters. 2.1 Defiitio of the Simple Regressio Model 22 Much of applied ecoometric aalysis begis with the followig premise: y ad x are two variables, represetig some populatio, ad we are iterested i explaiig y i terms of x, or i studyig how y varies with chages i x. We discussed some examples i Chapter 1, icludig: y is soybea crop yield ad x is amout of fertilizer; y is hourly wage ad x is years of educatio; ad y is a commuity crime rate ad x is umber of police officers. I writig dow a model that will explai y i terms of x, we must cofrot three issues. First, sice there is ever a exact relatioship betwee two variables, how do we allow for other factors to affect y? Secod, what is the fuctioal relatioship betwee y ad x? Ad third, how ca we be sure we are capturig a ceteris paribus relatioship betwee y ad x (if that is a desired goal)? We ca resolve these ambiguities by writig dow a equatio relatig y to x. A simple equatio is y 0 1 x u. [2.1] Equatio (2.1), which is assumed to hold i the populatio of iterest, defies the simple liear regressio model. It is also called the two-variable liear regressio model or bivariate liear regressio model because it relates the two variables x ad y. We ow discuss the meaig of each of the quatities i (2.1). [Icidetally, the term regressio has origis that are ot especially importat for most moder ecoometric applicatios, so we will ot explai it here. See Stigler (1986) for a egagig history of regressio aalysis.] Whe related by (2.1), the variables y ad x have several differet ames used iterchageably, as follows: y is called the depedet variable, the explaied variable, the Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

3 TABLE 2.1 Termiology for Simple Regressio y x Depedet variable Idepedet variable Explaied variable Explaatory variable Respose variable Cotrol variable Predicted variable Predictor variable Regressad Regressor Cegage Learig, 2013 respose variable, the predicted variable, or the regressad; x is called the idepedet variable, the explaatory variable, the cotrol variable, the predictor variable, or the regressor. (The term covariate is also used for x.) The terms depedet variable ad idepedet variable are frequetly used i ecoometrics. But be aware that the label idepedet here does ot refer to the statistical otio of idepedece betwee radom variables (see Appedix B). The terms explaied ad explaatory variables are probably the most descriptive. Respose ad cotrol are used mostly i the experimetal scieces, where the variable x is uder the experimeter s cotrol. We will ot use the terms predicted variable ad predictor, although you sometimes see these i applicatios that are purely about predictio ad ot causality. Our termiology for simple regressio is summarized i Table 2.1. The variable u, called the error term or disturbace i the relatioship, represets factors other tha x that affect y. A simple regressio aalysis effectively treats all factors affectig y other tha x as beig uobserved. You ca usefully thik of u as stadig for uobserved. Equatio (2.1) also addresses the issue of the fuctioal relatioship betwee y ad x. If the other factors i u are held fixed, so that the chage i u is zero, u 0, the x has a liear effect o y: y 1 x if u 0. [2.2] Thus, the chage i y is simply 1 multiplied by the chage i x. This meas that 1 is the slope parameter i the relatioship betwee y ad x, holdig the other factors i u fixed; it is of primary iterest i applied ecoomics. The itercept parameter 0, sometimes called the costat term, also has its uses, although it is rarely cetral to a aalysis. EXAMPLE 2.1 SOYBEAN YIELD AND FERTILIZER Suppose that soybea yield is determied by the model yield 0 1 fertilizer u, [2.3] so that y yield ad x fertilizer. The agricultural researcher is iterested i the effect of fertilizer o yield, holdig other factors fixed. This effect is give by 1. The error term u cotais factors such as lad quality, raifall, ad so o. The coefficiet 1 measures the effect of fertilizer o yield, holdig other factors fixed: yield 1 fertilizer. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

4 EXAMPLE 2.2 A SIMPLE WAGE EQUATION A model relatig a perso s wage to observed educatio ad other uobserved factors is wage 0 1 educ u. [2.4] If wage is measured i dollars per hour ad educ is years of educatio, the 1 measures the chage i hourly wage give aother year of educatio, holdig all other factors fixed. Some of those factors iclude labor force experiece, iate ability, teure with curret employer, work ethic, ad umerous other thigs. The liearity of (2.1) implies that a oe-uit chage i x has the same effect o y, regardless of the iitial value of x. This is urealistic for may ecoomic applicatios. For example, i the wage-educatio example, we might wat to allow for icreasig returs: the ext year of educatio has a larger effect o wages tha did the previous year. We will see how to allow for such possibilities i Sectio 2.4. The most difficult issue to address is whether model (2.1) really allows us to draw ceteris paribus coclusios about how x affects y. We just saw i equatio (2.2) that 1 does measure the effect of x o y, holdig all other factors (i u) fixed. Is this the ed of the causality issue? Ufortuately, o. How ca we hope to lear i geeral about the ceteris paribus effect of x o y, holdig other factors fixed, whe we are igorig all those other factors? Sectio 2.5 will show that we are oly able to get reliable estimators of 0 ad 1 from a radom sample of data whe we make a assumptio restrictig how the uobservable u is related to the explaatory variable x. Without such a restrictio, we will ot be able to estimate the ceteris paribus effect, 1. Because u ad x are radom variables, we eed a cocept grouded i probability. Before we state the key assumptio about how x ad u are related, we ca always make oe assumptio about u. As log as the itercept 0 is icluded i the equatio, othig is lost by assumig that the average value of u i the populatio is zero. Mathematically, E(u) 0. [2.5] Assumptio (2.5) says othig about the relatioship betwee u ad x, but simply makes a statemet about the distributio of the uobserved factors i the populatio. Usig the previous examples for illustratio, we ca see that assumptio (2.5) is ot very restrictive. I Example 2.1, we lose othig by ormalizig the uobserved factors affectig soybea yield, such as lad quality, to have a average of zero i the populatio of all cultivated plots. The same is true of the uobserved factors i Example 2.2. Without loss of geerality, we ca assume that thigs such as average ability are zero i the populatio of all workig people. If you are ot coviced, you should work through Problem 2 to see that we ca always redefie the itercept i equatio (2.1) to make (2.5) true. We ow tur to the crucial assumptio regardig how u ad x are related. A atural measure of the associatio betwee two radom variables is the correlatio coefficiet. (See Appedix B for defiitio ad properties.) If u ad x are ucorrelated, the, as radom variables, they are ot liearly related. Assumig that u ad x are ucorrelated goes a log way toward defiig the sese i which u ad x should be urelated i equatio (2.1). But it does ot go far eough, because correlatio measures oly liear depedece betwee u ad x. Correlatio has a somewhat couterituitive feature: it is possible for u to be ucorrelated with x while beig correlated with fuctios of x, such as x 2. (See Sectio B.4 for further discussio.) This possibility is ot acceptable for most regressio purposes, as it Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

5 causes problems for iterpretig the model ad for derivig statistical properties. A better assumptio ivolves the expected value of u give x. Because u ad x are radom variables, we ca defie the coditioal distributio of u give ay value of x. I particular, for ay x, we ca obtai the expected (or average) value of u for that slice of the populatio described by the value of x. The crucial assumptio is that the average value of u does ot deped o the value of x. We ca write this assumptio as E(u x) E(u). [2.6] Equatio (2.6) says that the average value of the uobservables is the same across all slices of the populatio determied by the value of x ad that the commo average is ecessarily equal to the average of u over the etire populatio. Whe assumptio (2.6) holds, we say that u is mea idepedet of x. (Of course, mea idepedece is implied by full idepedece betwee u ad x, a assumptio ofte used i basic probability ad statistics.) Whe we combie mea idepedece with assumptio (2.5), we obtai the zero coditioal mea assumptio, E(u x) 0. It is critical to remember that equatio (2.6) is the assumptio with impact; assumptio (2.5) essetially defies the itercept, 0. Let us see what (2.6) etails i the wage example. To simplify the discussio, assume that u is the same as iate ability. The (2.6) requires that the average level of ability is the same regardless of years of educatio. For example, if E(abil 8) deotes the average ability for the group of all people with eight years of educatio, ad E(abil 16) deotes the average ability amog people i the populatio with sixtee years of educatio, the (2.6) implies that these must be the same. I fact, the average ability level must be the same for all educatio levels. If, for example, we thik that average ability icreases with years of educatio, the (2.6) is false. (This would happe if, o average, people with more ability choose to become more educated.) As we caot observe iate ability, we have o way of kowig whether or ot average ability is the same for all educatio levels. But this is a issue that we must address before relyig o simple regressio aalysis. EXPLORING FURTHER 2.1 Suppose that a score o a fial exam, score, depeds o classes atteded (atted) ad uobserved factors that affect exam performace (such as studet ability). The score 0 1 atted u. [2.7] Whe would you expect this model to satisfy (2.6)? I the fertilizer example, if fertilizer amouts are chose idepedetly of other features of the plots, the (2.6) will hold: the average lad quality will ot deped o the amout of fertilizer. However, if more fertilizer is put o the higher-quality plots of lad, the the expected value of u chages with the level of fertilizer, ad (2.6) fails. The zero coditioal mea assumptio gives 1 aother iterpretatio that is ofte useful. Takig the expected value of (2.1) coditioal o x ad usig E(u x) 0 gives E(y x) 0 1 x. [2.8] Equatio (2.8) shows that the populatio regressio fuctio (PRF), E(y x), is a liear fuctio of x. The liearity meas that a oe-uit icrease i x chages the expected value of y by the amout 1. For ay give value of x, the distributio of y is cetered about E(y x), as illustrated i Figure 2.1. It is importat to uderstad that equatio (2.8) tells us how the average value of y chages with x; it does ot say that y equals 0 1 x for all uits i the populatio. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

6 FIGURE 2.1 E(y x) as a liear fuctio of x. y E(y x) x x 1 x 2 x 3 Cegage Learig, 2013 For example, suppose that x is the high school grade poit average ad y is the college GPA, ad we happe to kow that E(colGPA hsgpa) hsgpa. [Of course, i practice, we ever kow the populatio itercept ad slope, but it is useful to preted mometarily that we do to uderstad the ature of equatio (2.8).] This GPA equatio tells us the average college GPA amog all studets who have a give high school GPA. So suppose that hsgpa 3.6. The the average colgpa for all high school graduates who atted college with hsgpa 3.6 is (3.6) 3.3. We are certaily ot sayig that every studet with hsgpa 3.6 will have a 3.3 college GPA; this is clearly false. The PRF gives us a relatioship betwee the average level of y at differet levels of x. Some studets with hsgpa 3.6 will have a college GPA higher tha 3.3, ad some will have a lower college GPA. Whether the actual colgpa is above or below 3.3 depeds o the uobserved factors i u, ad those differ amog studets eve withi the slice of the populatio with hsgpa 3.6. Give the zero coditioal mea assumptio E(u x) 0, it is useful to view equatio (2.1) as breakig y ito two compoets. The piece 0 1 x, which represets E(y x), is called the systematic part of y that is, the part of y explaied by x ad u is called the usystematic part, or the part of y ot explaied by x. I Chapter 3, whe we itroduce more tha oe explaatory variable, we will discuss how to determie how large the systematic part is relative to the usystematic part. I the ext sectio, we will use assumptios (2.5) ad (2.6) to motivate estimators of 0 ad 1 give a radom sample of data. The zero coditioal mea assumptio also plays a crucial role i the statistical aalysis i Sectio 2.6. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

7 2.2 Derivig the Ordiary Least Squares Estimates Now that we have discussed the basic igrediets of the simple regressio model, we will address the importat issue of how to estimate the parameters 0 ad 1 i equatio (2.1). To do this, we eed a sample from the populatio. Let {(x i,y i ): i 1,, } deote a radom sample of size from the populatio. Because these data come from (2.1), we ca write y i 0 1 x i u i [2.9] for each i. Here, u i is the error term for observatio i because it cotais all factors affectig y i other tha x i. As a example, x i might be the aual icome ad y i the aual savigs for family i durig a particular year. If we have collected data o fiftee families, the 15. A scatterplot of such a data set is give i Figure 2.2, alog with the (ecessarily fictitious) populatio regressio fuctio. We must decide how to use these data to obtai estimates of the itercept ad slope i the populatio regressio of savigs o icome. There are several ways to motivate the followig estimatio procedure. We will use (2.5) ad a importat implicatio of assumptio (2.6): i the populatio, u is ucorrelated with x. Therefore, we see that u has zero expected value ad that the covariace betwee x ad u is zero: E(u) 0 [2.10] FIGURE 2.2 Scatterplot of savigs ad icome for 15 families, ad the populatio regressio E(savigs icome) 0 1 icome. savigs E(savigs icome) icome 0 icome 0 Cegage Learig, 2013 Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

8 ad Cov(x,u) E(xu) 0, [2.11] where the first equality i (2.11) follows from (2.10). (See Sectio B.4 for the defiitio ad properties of covariace.) I terms of the observable variables x ad y ad the ukow parameters 0 ad 1, equatios (2.10) ad (2.11) ca be writte as ad E(y 0 1 x) 0 [2.12] E[x(y 0 1 x)] 0, [2.13] respectively. Equatios (2.12) ad (2.13) imply two restrictios o the joit probability distributio of (x,y) i the populatio. Sice there are two ukow parameters to estimate, we might hope that equatios (2.12) ad (2.13) ca be used to obtai good estimators of 0 ad 1. I fact, they ca be. Give a sample of data, we choose estimates ˆ 0 ad ˆ1 to solve the sample couterparts of (2.12) ad (2.13): 1 (y i ˆ0 ˆ1x i ) 0 [2.14] ad 1 x i (y i ˆ0 ˆ1x i ) 0. [2.15] This is a example of the method of momets approach to estimatio. (See Sectio C.4 for a discussio of differet estimatio approaches.) These equatios ca be solved for ˆ0 ad ˆ1. Usig the basic properties of the summatio operator from Appedix A, equatio (2.14) ca be rewritte as ȳ ˆ0 ˆ1x, [2.16] where ȳ 1 y i is the sample average of the y i ad likewise for x. This equatio allows us to write ˆ0 i terms of ˆ1, ȳ, ad x : ˆ0 ȳ ˆ1x. [2.17] Therefore, oce we have the slope estimate ˆ1, it is straightforward to obtai the itercept estimate ˆ0, give ȳ ad x. Droppig the 1 i (2.15) (sice it does ot affect the solutio) ad pluggig (2.17) ito (2.15) yields which, upo rearragemet, gives x i [y i (ȳ ˆ1x ) ˆ1x i ] 0, x i (y i ȳ) ˆ1 x i (x i x ). From basic properties of the summatio operator [see (A.7) ad (A.8)], x i (x i x ) (x i x ) 2 ad x i (y i ȳ) (x i x )(y i ȳ). Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

9 Therefore, provided that the estimated slope is ˆ1 (x i x ) 2 0, [2.18] (x i x ) (y i ȳ). [2.19] (x i x ) 2 Equatio (2.19) is simply the sample covariace betwee x ad y divided by the sample variace of x. (See Appedix C. Dividig both the umerator ad the deomiator by 1 chages othig.) This makes sese because 1 equals the populatio covariace divided by the variace of x whe E(u) 0 ad Cov(x,u) 0. A immediate implicatio is that if x ad y are positively correlated i the sample, the ˆ1 is positive; if x ad y are egatively correlated, the ˆ1 is egative. Although the method for obtaiig (2.17) ad (2.19) is motivated by (2.6), the oly assumptio eeded to compute the estimates for a particular sample is (2.18). This is hardly a assumptio at all: (2.18) is true provided the x i i the sample are ot all equal to the same value. If (2.18) fails, the we have either bee ulucky i obtaiig our sample from the populatio or we have ot specified a iterestig problem (x does ot vary i the populatio). For example, if y wage ad x educ, the (2.18) fails oly if everyoe i the sample has the same amout of educatio (for example, if everyoe is a high school graduate; see Figure 2.3). If just oe perso has a differet amout of educatio, the (2.18) holds, ad the estimates ca be computed. FIGURE 2.3 A scatterplot of wage agaist educatio whe educ i 12 for all i. wage 0 12 educ Cegage Learig, 2013 Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

10 The estimates give i (2.17) ad (2.19) are called the ordiary least squares (OLS) estimates of 0 ad 1. To justify this ame, for ay ˆ0 ad ˆ1 defie a fitted value for y whe x x i as ˆ y i ˆ0 ˆ1x i. [2.20] This is the value we predict for y whe x x i for the give itercept ad slope. There is a fitted value for each observatio i the sample. The residual for observatio i is the differece betwee the actual y i ad its fitted value: û i y i ˆ y i y i ˆ0 ˆ1x i. [2.21] Agai, there are such residuals. [These are ot the same as the errors i (2.9), a poit we retur to i Sectio 2.5.] The fitted values ad residuals are idicated i Figure 2.4. Now, suppose we choose ˆ0 ad ˆ1 to make the sum of squared residuals, 2 û i (y i ˆ0 ˆ1x i ) 2, [2.22] as small as possible. The appedix to this chapter shows that the coditios ecessary for ( ˆ0, ˆ1) to miimize (2.22) are give exactly by equatios (2.14) ad (2.15), without 1. Equatios (2.14) ad (2.15) are ofte called the first order coditios for the OLS estimates, a term that comes from optimizatio usig calculus (see Appedix A). From our previous calculatios, we kow that the solutios to the OLS first order coditios are give by (2.17) ad (2.19). The ame ordiary least squares comes from the fact that these estimates miimize the sum of squared residuals. FIGURE 2.4 Fitted values ad residuals. y y i û i 5 residual y ˆ 5 ˆ 0 1 ˆ 1x y 1 yˆ 1 yˆ i 5 fitted value x 1 x i x Cegage Learig, 2013 Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

11 Whe we view ordiary least squares as miimizig the sum of squared residuals, it is atural to ask: Why ot miimize some other fuctio of the residuals, such as the absolute values of the residuals? I fact, as we will discuss i the more advaced Sectio 9.4, miimizig the sum of the absolute values of the residuals is sometimes very useful. But it does have some drawbacks. First, we caot obtai formulas for the resultig estimators; give a data set, the estimates must be obtaied by umerical optimizatio routies. As a cosequece, the statistical theory for estimators that miimize the sum of the absolute residuals is very complicated. Miimizig other fuctios of the residuals, say, the sum of the residuals each raised to the fourth power, has similar drawbacks. (We would ever choose our estimates to miimize, say, the sum of the residuals themselves, as residuals large i magitude but with opposite sigs would ted to cacel out.) With OLS, we will be able to derive ubiasedess, cosistecy, ad other importat statistical properties relatively easily. Plus, as the motivatio i equatios (2.13) ad (2.14) suggests, ad as we will see i Sectio 2.5, OLS is suited for estimatig the parameters appearig i the coditioal mea fuctio (2.8). Oce we have determied the OLS itercept ad slope estimates, we form the OLS regressio lie: ˆ y ˆ0 ˆ1x, [2.23] where it is uderstood that ˆ0 ad ˆ1 have bee obtaied usig equatios (2.17) ad (2.19). The otatio ˆ y, read as y hat, emphasizes that the predicted values from equatio (2.23) are estimates. The itercept, ˆ0, is the predicted value of y whe x 0, although i some cases it will ot make sese to set x 0. I those situatios, ˆ0 is ot, i itself, very iterestig. Whe usig (2.23) to compute predicted values of y for various values of x, we must accout for the itercept i the calculatios. Equatio (2.23) is also called the sample regressio fuctio (SRF) because it is the estimated versio of the populatio regressio fuctio E(y x) 0 1 x. It is importat to remember that the PRF is somethig fixed, but ukow, i the populatio. Because the SRF is obtaied for a give sample of data, a ew sample will geerate a differet slope ad itercept i equatio (2.23). I most cases, the slope estimate, which we ca write as ˆ1 y ˆ x, [2.24] is of primary iterest. It tells us the amout by which ˆ y chages whe x icreases by oe uit. Equivaletly, y ˆ ˆ1 x, [2.25] so that give ay chage i x (whether positive or egative), we ca compute the predicted chage i y. We ow preset several examples of simple regressio obtaied by usig real data. I other words, we fid the itercept ad slope estimates with equatios (2.17) ad (2.19). Sice these examples ivolve may observatios, the calculatios were doe usig a ecoometrics software package. At this poit, you should be careful ot to read too much ito these regressios; they are ot ecessarily ucoverig a causal relatioship. We have said othig so far about the statistical properties of OLS. I Sectio 2.5, we cosider Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

12 statistical properties after we explicitly impose assumptios o the populatio model equatio (2.1). EXAMPLE 2.3 CEO SALARY AND RETURN ON EQUITY For the populatio of chief executive officers, let y be aual salary (salary) i thousads of dollars. Thus, y idicates a aual salary of $856,300, ad y 1,452.6 idicates a salary of $1,452,600. Let x be the average retur o equity (roe) for the CEO s firm for the previous three years. (Retur o equity is defied i terms of et icome as a percetage of commo equity.) For example, if roe 10, the average retur o equity is 10%. To study the relatioship betwee this measure of firm performace ad CEO compesatio, we postulate the simple model salary 0 1 roe u. The slope parameter 1 measures the chage i aual salary, i thousads of dollars, whe retur o equity icreases by oe percetage poit. Because a higher roe is good for the compay, we thik 1 0. The data set CEOSAL1.RAW cotais iformatio o 209 CEOs for the year 1990; these data were obtaied from Busiess Week (5/6/91). I this sample, the average aual salary is $1,281,120, with the smallest ad largest beig $223,000 ad $14,822,000, respectively. The average retur o equity for the years 1988, 1989, ad 1990 is 17.18%, with the smallest ad largest values beig 0.5 ad 56.3%, respectively. Usig the data i CEOSAL1.RAW, the OLS regressio lie relatig salary to roe is salary roe [2.26] 209, where the itercept ad slope estimates have bee rouded to three decimal places; we use salary hat to idicate that this is a estimated equatio. How do we iterpret the equatio? First, if the retur o equity is zero, roe 0, the the predicted salary is the itercept, , which equals $963,191 sice salary is measured i thousads. Next, we ca write the predicted chage i salary as a fuctio of the chage i roe: salary ( roe). This meas that if the retur o equity icreases by oe percetage poit, roe 1, the salary is predicted to chage by about 18.5, or $18,500. Because (2.26) is a liear equatio, this is the estimated chage regardless of the iitial salary. We ca easily use (2.26) to compare predicted salaries at differet values of roe. Suppose roe 30. The salary (30) 1,518,221, which is just over $1.5 millio. However, this does ot mea that a particular CEO whose firm had a roe 30 ears $1,518,221. May other factors affect salary. This is just our predictio from the OLS regressio lie (2.26). The estimated lie is graphed i Figure 2.5, alog with the populatio regressio fuctio E(salary roe). We will ever kow the PRF, so we caot tell how close the SRF is to the PRF. Aother sample of data will give a differet regressio lie, which may or may ot be closer to the populatio regressio lie. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

13 FIGURE 2.5 The OLS regressio lie salary roe ad the (ukow) populatio regressio fuctio. salary salary roe E(salary roe) roe roe Cegage Learig, 2013 EXAMPLE 2.4 WAGE AND EDUCATION For the populatio of people i the workforce i 1976, let y wage, where wage is measured i dollars per hour. Thus, for a particular perso, if wage 6.75, the hourly wage is $6.75. Let x educ deote years of schoolig; for example, educ 12 correspods to a complete high school educatio. Sice the average wage i the sample is $5.90, the Cosumer Price Idex idicates that this amout is equivalet to $19.06 i 2003 dollars. Usig the data i WAGE1.RAW where 526 idividuals, we obtai the followig OLS regressio lie (or sample regressio fuctio): wage educ [2.27] 526. EXPLORING FURTHER 2.2 The estimated wage from (2.27), whe educ 8, is $3.42 i 1976 dollars. What is this value i 2003 dollars? (Hit: You have eough iformatio i Example 2.4 to aswer this questio.) We must iterpret this equatio with cautio. The itercept of 0.90 literally meas that a perso with o educatio has a predicted hourly wage of 90 a hour. This, of course, is silly. It turs out that oly 18 people i the sample of 526 have less tha eight years of Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

14 educatio. Cosequetly, it is ot surprisig that the regressio lie does poorly at very low levels of educatio. For a perso with eight years of educatio, the predicted wage is wage (8) 3.42, or $3.42 per hour (i 1976 dollars). The slope estimate i (2.27) implies that oe more year of educatio icreases hourly wage by 54 a hour. Therefore, four more years of educatio icrease the predicted wage by 4(0.54) 2.16, or $2.16 per hour. These are fairly large effects. Because of the liear ature of (2.27), aother year of educatio icreases the wage by the same amout, regardless of the iitial level of educatio. I Sectio 2.4, we discuss some methods that allow for ocostat margial effects of our explaatory variables. EXAMPLE 2.5 VOTING OUTCOMES AND CAMPAIGN EXPENDITURES The file VOTE1.RAW cotais data o electio outcomes ad campaig expeditures for 173 two-party races for the U.S. House of Represetatives i There are two cadidates i each race, A ad B. Let votea be the percetage of the vote received by Cadidate A ad sharea be the percetage of total campaig expeditures accouted for by Cadidate A. May factors other tha sharea affect the electio outcome (icludig the quality of the cadidates ad possibly the dollar amouts spet by A ad B). Nevertheless, we ca estimate a simple regressio model to fid out whether spedig more relative to oe s challeger implies a higher percetage of the vote. The estimated equatio usig the 173 observatios is votea sharea [2.28] 173. This meas that if Cadidate A s share of spedig icreases by oe percetage poit, Cadidate A receives almost oe-half a percetage poit (0.464) more of the total vote. Whether or ot this is a causal effect is uclear, but it is ot ubelievable. If sharea 50, votea is predicted to be about 50, or half the vote. EXPLORING FURTHER 2.3 I Example 2.5, what is the predicted vote for Cadidate A if sharea 60 (which meas 60%)? Does this aswer seem reasoable? I some cases, regressio aalysis is ot used to determie causality but to simply look at whether two variables are positively or egatively related, much like a stadard correlatio aalysis. A example of this occurs i Computer Exercise C3, where you are asked to use data from Biddle ad Hamermesh (1990) o time spet sleepig ad workig to ivestigate the tradeoff betwee these two factors. A Note o Termiology I most cases, we will idicate the estimatio of a relatioship through OLS by writig a equatio such as (2.26), (2.27), or (2.28). Sometimes, for the sake of brevity, it is useful to idicate that a OLS regressio has bee ru without actually writig out the equatio. We will ofte idicate that equatio (2.23) has bee obtaied by OLS i sayig that we ru the regressio of y o x, [2.29] Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

15 or simply that we regress y o x. The positios of y ad x i (2.29) idicate which is the depedet variable ad which is the idepedet variable: we always regress the depedet variable o the idepedet variable. For specific applicatios, we replace y ad x with their ames. Thus, to obtai (2.26), we regress salary o roe, or to obtai (2.28), we regress votea o sharea. Whe we use such termiology i (2.29), we will always mea that we pla to estimate the itercept, ˆ0, alog with the slope, ˆ1. This case is appropriate for the vast majority of applicatios. Occasioally, we may wat to estimate the relatioship betwee y ad x assumig that the itercept is zero (so that x 0 implies that y ˆ 0); we cover this case briefly i Sectio 2.6. Uless explicitly stated otherwise, we always estimate a itercept alog with a slope. 2.3 Properties of OLS o Ay Sample of Data I the previous sectio, we wet through the algebra of derivig the formulas for the OLS itercept ad slope estimates. I this sectio, we cover some further algebraic properties of the fitted OLS regressio lie. The best way to thik about these properties is to remember that they hold, by costructio, for ay sample of data. The harder task cosiderig the properties of OLS across all possible radom samples of data is postpoed util Sectio 2.5. Several of the algebraic properties we are goig to derive will appear mudae. Never theless, havig a grasp of these properties helps us to figure out what happes to the OLS estimates ad related statistics whe the data are maipulated i certai ways, such as whe the measuremet uits of the depedet ad idepedet variables chage. Fitted Values ad Residuals We assume that the itercept ad slope estimates, ˆ0 ad ˆ1, have bee obtaied for the give sample of data. Give ˆ0 ad ˆ1, we ca obtai the fitted value y ˆ i for each observatio. [This is give by equatio (2.20).] By defiitio, each fitted value of y ˆ i is o the OLS regressio lie. The OLS residual associated with observatio i, û i, is the differece betwee y i ad its fitted value, as give i equatio (2.21). If û i is positive, the lie uderpredicts y i ; if û i is egative, the lie overpredicts y i. The ideal case for observatio i is whe û i 0, but i most cases, every residual is ot equal to zero. I other words, oe of the data poits must actually lie o the OLS lie. EXAMPLE 2.6 CEO SALARY AND RETURN ON EQUITY Table 2.2 cotais a listig of the first 15 observatios i the CEO data set, alog with the fitted values, called salaryhat, ad the residuals, called uhat. The first four CEOs have lower salaries tha what we predicted from the OLS regressio lie (2.26); i other words, give oly the firm s roe, these CEOs make less tha what we predicted. As ca be see from the positive uhat, the fifth CEO makes more tha predicted from the OLS regressio lie. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

16 TABLE 2.2 Fitted Values ad Residuals for the First 15 CEOs obso roe salary salaryhat uhat Cegage Learig, 2013 Algebraic Properties of OLS Statistics There are several useful algebraic properties of OLS estimates ad their associated statistics. We ow cover the three most importat of these. (1) The sum, ad therefore the sample average of the OLS residuals, is zero. Mathematically, u ˆ i 0. [2.30] This property eeds o proof; it follows immediately from the OLS first order coditio (2.14), whe we remember that the residuals are defied by û i y i ˆ0 ˆ1x i. I other words, the OLS estimates ˆ0 ad ˆ1 are chose to make the residuals add up to zero (for ay data set). This says othig about the residual for ay particular observatio i. (2) The sample covariace betwee the regressors ad the OLS residuals is zero. This follows from the first order coditio (2.15), which ca be writte i terms of the residuals as x i û i 0. [2.31] The sample average of the OLS residuals is zero, so the left-had side of (2.31) is proportioal to the sample covariace betwee x i ad û i. (3) The poit (x,ȳ) is always o the OLS regressio lie. I other words, if we take equatio (2.23) ad plug i x for x, the the predicted value is ȳ. This is exactly what equatio (2.16) showed us. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

17 EXAMPLE 2.7 WAGE AND EDUCATION For the data i WAGE1.RAW, the average hourly wage i the sample is 5.90, rouded to two decimal places, ad the average educatio is If we plug educ ito the OLS regressio lie (2.27), we get wage (12.56) , which equals 5.9 whe rouded to the first decimal place. These figures do ot exactly agree because we have rouded the average wage ad educatio, as well as the itercept ad slope estimates. If we did ot iitially roud ay of the values, we would get the aswers to agree more closely, but to little useful effect. Writig each y i as its fitted value, plus its residual, provides aother way to iterpret a OLS regressio. For each i, write y i ˆ y i û i. [2.32] From property (1), the average of the residuals is zero; equivaletly, the sample average of the fitted values, y ˆ i, is the same as the sample average of the y i, or ȳ ˆ ȳ. Further, properties (1) ad (2) ca be used to show that the sample covariace betwee y ˆ i ad û i is zero. Thus, we ca view OLS as decomposig each y i ito two parts, a fitted value ad a residual. The fitted values ad residuals are ucorrelated i the sample. Defie the total sum of squares (SST), the explaied sum of squares (SSE), ad the residual sum of squares (SSR) (also kow as the sum of squared residuals), as follows: SST (y i ȳ) 2. [2.33] SSE ( y ˆ i ȳ) 2. [2.34] SSR û 2 i. [2.35] SST is a measure of the total sample variatio i the y i ; that is, it measures how spread out the y i are i the sample. If we divide SST by 1, we obtai the sample variace of y, as discussed i Appedix C. Similarly, SSE measures the sample variatio i the y ˆ i (where we use the fact that ŷ ȳ), ad SSR measures the sample variatio i the û i. The total variatio i y ca always be expressed as the sum of the explaied variatio ad the uexplaied variatio SSR. Thus, SST SSE SSR. [2.36] Provig (2.36) is ot difficult, but it requires us to use all of the properties of the summatio operator covered i Appedix A. Write (y i ȳ) 2 [(y i y ˆ i ) ( y ˆ i ȳ)] 2 [û i ( y ˆ i ȳ)] 2 2 û i 2 û i ( ˆ y i ȳ) SSR 2 û i ( y ˆ i ȳ) SSE. ( y ˆ i ȳ) 2 Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

18 Now, (2.36) holds if we show that û i ( y ˆ i ȳ) 0. [2.37] But we have already claimed that the sample covariace betwee the residuals ad the fitted values is zero, ad this covariace is just (2.37) divided by 1. Thus, we have established (2.36). Some words of cautio about SST, SSE, ad SSR are i order. There is o uiform agreemet o the ames or abbreviatios for the three quatities defied i equatios (2.33), (2.34), ad (2.35). The total sum of squares is called either SST or TSS, so there is little cofusio here. Ufortuately, the explaied sum of squares is sometimes called the regressio sum of squares. If this term is give its atural abbreviatio, it ca easily be cofused with the term residual sum of squares. Some regressio packages refer to the explaied sum of squares as the model sum of squares. To make matters eve worse, the residual sum of squares is ofte called the error sum of squares. This is especially ufortuate because, as we will see i Sectio 2.5, the errors ad the residuals are differet quatities. Thus, we will always call (2.35) the residual sum of squares or the sum of squared residuals. We prefer to use the abbreviatio SSR to deote the sum of squared residuals, because it is more commo i ecoometric packages. Goodess-of-Fit So far, we have o way of measurig how well the explaatory or idepedet variable, x, explais the depedet variable, y. It is ofte useful to compute a umber that summarizes how well the OLS regressio lie fits the data. I the followig discussio, be sure to remember that we assume that a itercept is estimated alog with the slope. Assumig that the total sum of squares, SST, is ot equal to zero which is true except i the very ulikely evet that all the y i equal the same value we ca divide (2.36) by SST to get 1 SSE/SST SSR/SST. The R-squared of the regressio, sometimes called the coefficiet of determiatio, is defied as R 2 SSE/SST 1 SSR/SST. [2.38] R 2 is the ratio of the explaied variatio compared to the total variatio; thus, it is iterpreted as the fractio of the sample variatio i y that is explaied by x. The secod equality i (2.38) provides aother way for computig R 2. From (2.36), the value of R 2 is always betwee zero ad oe, because SSE ca be o greater tha SST. Whe iterpretig R 2, we usually multiply it by 100 to chage it ito a percet: 100 R 2 is the percetage of the sample variatio i y that is explaied by x. If the data poits all lie o the same lie, OLS provides a perfect fit to the data. I this case, R 2 1. A value of R 2 that is early equal to zero idicates a poor fit of the OLS lie: very little of the variatio i the y i is captured by the variatio i the y ˆ i (which all lie o the OLS regressio lie). I fact, it ca be show that R 2 is equal to the square of the sample correlatio coefficiet betwee y i ad y ˆ i. This is where the term R-squared came from. (The letter R was traditioally used to deote a estimate of a populatio correlatio coefficiet, ad its usage has survived i regressio aalysis.) Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

19 EXAMPLE 2.8 CEO SALARY AND RETURN ON EQUITY I the CEO salary regressio, we obtai the followig: salary roe [2.39] 209, R We have reproduced the OLS regressio lie ad the umber of observatios for clarity. Usig the R-squared (rouded to four decimal places) reported for this equatio, we ca see how much of the variatio i salary is actually explaied by the retur o equity. The aswer is: ot much. The firm s retur o equity explais oly about 1.3% of the variatio i salaries for this sample of 209 CEOs. That meas that 98.7% of the salary variatios for these CEOs is left uexplaied! This lack of explaatory power may ot be too surprisig because may other characteristics of both the firm ad the idividual CEO should ifluece salary; these factors are ecessarily icluded i the errors i a simple regressio aalysis. I the social scieces, low R-squareds i regressio equatios are ot ucommo, especially for cross-sectioal aalysis. We will discuss this issue more geerally uder multiple regressio aalysis, but it is worth emphasizig ow that a seemigly low R-squared does ot ecessarily mea that a OLS regressio equatio is useless. It is still possible that (2.39) is a good estimate of the ceteris paribus relatioship betwee salary ad roe; whether or ot this is true does ot deped directly o the size of R- squared. Studets who are first learig ecoometrics ted to put too much weight o the size of the R-squared i evaluatig regressio equatios. For ow, be aware that usig R-squared as the mai gauge of success for a ecoometric aalysis ca lead to trouble. Sometimes, the explaatory variable explais a substatial part of the sample variatio i the depedet variable. EXAMPLE 2.9 VOTING OUTCOMES AND CAMPAIGN EXPENDITURES I the votig outcome equatio i (2.28), R Thus, the share of campaig expeditures explais over 85% of the variatio i the electio outcomes for this sample. This is a sizable portio. 2.4 Uits of Measuremet ad Fuctioal Form Two importat issues i applied ecoomics are (1) uderstadig how chagig the uits of measuremet of the depedet ad/or idepedet variables affects OLS estimates ad (2) kowig how to icorporate popular fuctioal forms used i ecoomics ito regressio aalysis. The mathematics eeded for a full uderstadig of fuctioal form issues is reviewed i Appedix A. Copyright 2012 Cegage Learig. All Rights Reserved. May ot be copied, scaed, or duplicated, i whole or i part. Due to electroic rights, some third party cotet may be suppressed from the ebook ad/or echapter(s). Editorial revi deemed that ay suppressed cotet does ot materially affect the overall learig experiece. Cegage Learig reserves the right to remove additioal cotet at ay time if subsequet rights restrictios require it.

Part 1 of the text covers regression analysis with cross-sectional data. It builds upon a solid

Part 1 of the text covers regression analysis with cross-sectional data. It builds upon a solid Part 1 Regressio Aalysis with Cross-Sectioal Data Part 1 of the text covers regressio aalysis with cross-sectioal data. It builds upo a solid base of college algebra ad basic cocepts i probability ad statistics.

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio

More information

Simple Linear Regression

Simple Linear Regression Chapter 2 Simple Liear Regressio 2.1 Simple liear model The simple liear regressio model shows how oe kow depedet variable is determied by a sigle explaatory variable (regressor). Is is writte as: Y i

More information

Simple Regression Model

Simple Regression Model Simple Regressio Model 1. The Model y i 0 1 x i u i where y i depedet variable x i idepedet variable u i disturbace/error term i 1,..., Eg: y wage (measured i 1976 dollars per hr) x educatio (measured

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Statistical Properties of OLS estimators

Statistical Properties of OLS estimators 1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of

More information

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated

More information

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So, 0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Correlation Regression

Correlation Regression Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother

More information

The Simple Regression Model

The Simple Regression Model The Simple Regressio Model Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) SLR 1 / 75 Defiitio of the Simple Regressio Model Defiitio of the Simple Regressio Model Pig Yu (HKU)

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Regression Analysis with Cross-Sectional Data

Regression Analysis with Cross-Sectional Data 89782_02_c02_p023-072.qxd 5/25/05 11:46 AM Page 23 PART 1 Regression Analysis with Cross-Sectional Data P art 1 of the text covers regression analysis with cross-sectional data. It builds upon a solid

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Final Examination Solutions 17/6/2010

Final Examination Solutions 17/6/2010 The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

STP 226 EXAMPLE EXAM #1

STP 226 EXAMPLE EXAM #1 STP 226 EXAMPLE EXAM #1 Istructor: Hoor Statemet: I have either give or received iformatio regardig this exam, ad I will ot do so util all exams have bee graded ad retured. PRINTED NAME: Siged Date: DIRECTIONS:

More information

Lesson 11: Simple Linear Regression

Lesson 11: Simple Linear Regression Lesso 11: Simple Liear Regressio Ka-fu WONG December 2, 2004 I previous lessos, we have covered maily about the estimatio of populatio mea (or expected value) ad its iferece. Sometimes we are iterested

More information

Chapter 6: The Simple Regression Model

Chapter 6: The Simple Regression Model Chapter 6: The Simple Regressio Model Statistics ad Itroductio to Ecoometrics M. Ageles Carero Departameto de Fudametos del Aálisis Ecoómico Year 2014-15 M. Ageles Carero (UA) Chapter 6: SRM Year 2014-15

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

Read through these prior to coming to the test and follow them when you take your test.

Read through these prior to coming to the test and follow them when you take your test. Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1

More information

Estimation of a population proportion March 23,

Estimation of a population proportion March 23, 1 Social Studies 201 Notes for March 23, 2005 Estimatio of a populatio proportio Sectio 8.5, p. 521. For the most part, we have dealt with meas ad stadard deviatios this semester. This sectio of the otes

More information

Kinetics of Complex Reactions

Kinetics of Complex Reactions Kietics of Complex Reactios by Flick Colema Departmet of Chemistry Wellesley College Wellesley MA 28 wcolema@wellesley.edu Copyright Flick Colema 996. All rights reserved. You are welcome to use this documet

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N. 3/3/04 CDS M Phil Old Least Squares (OLS) Vijayamohaa Pillai N CDS M Phil Vijayamoha CDS M Phil Vijayamoha Types of Relatioships Oly oe idepedet variable, Relatioship betwee ad is Liear relatioships Curviliear

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to: STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Confidence Intervals for the Population Proportion p

Confidence Intervals for the Population Proportion p Cofidece Itervals for the Populatio Proportio p The cocept of cofidece itervals for the populatio proportio p is the same as the oe for, the samplig distributio of the mea, x. The structure is idetical:

More information

Exponents. Learning Objectives. Pre-Activity

Exponents. Learning Objectives. Pre-Activity Sectio. Pre-Activity Preparatio Epoets A Chai Letter Chai letters are geerated every day. If you sed a chai letter to three frieds ad they each sed it o to three frieds, who each sed it o to three frieds,

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments: Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal

More information

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n, CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

ANALYSIS OF EXPERIMENTAL ERRORS

ANALYSIS OF EXPERIMENTAL ERRORS ANALYSIS OF EXPERIMENTAL ERRORS All physical measuremets ecoutered i the verificatio of physics theories ad cocepts are subject to ucertaities that deped o the measurig istrumets used ad the coditios uder

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

Solutions to Odd Numbered End of Chapter Exercises: Chapter 4

Solutions to Odd Numbered End of Chapter Exercises: Chapter 4 Itroductio to Ecoometrics (3 rd Updated Editio) by James H. Stock ad Mark W. Watso Solutios to Odd Numbered Ed of Chapter Exercises: Chapter 4 (This versio July 2, 24) Stock/Watso - Itroductio to Ecoometrics

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Activity 3: Length Measurements with the Four-Sided Meter Stick

Activity 3: Length Measurements with the Four-Sided Meter Stick Activity 3: Legth Measuremets with the Four-Sided Meter Stick OBJECTIVE: The purpose of this experimet is to study errors ad the propagatio of errors whe experimetal data derived usig a four-sided meter

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n. ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic

More information

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6) STAT 350 Hadout 9 Samplig Distributio, Cetral Limit Theorem (6.6) A radom sample is a sequece of radom variables X, X 2,, X that are idepedet ad idetically distributed. o This property is ofte abbreviated

More information

Regression and correlation

Regression and correlation Cotets 43 Regressio ad correlatio 1. Regressio. Correlatio Learig outcomes You will lear how to explore relatioships betwee variables ad how to measure the stregth of such relatioships. You should ote

More information

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram. Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

PROVING CAUSALITY IN SOCIAL SCIENCE: A POTENTIAL APPLICATION OF OLOGS

PROVING CAUSALITY IN SOCIAL SCIENCE: A POTENTIAL APPLICATION OF OLOGS PROVING CAUSALITY IN SOCIAL SCIENCE: A POTENTIAL APPLICATION OF OLOGS By Noam Agrist 1 THE GOALS OF SOCIAL SCIENCE Explai the world aroud us. What is really happeig ad why. Example: do Kidles boost test

More information

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day LECTURE # 8 Mea Deviatio, Stadard Deviatio ad Variace & Coefficiet of variatio Mea Deviatio Stadard Deviatio ad Variace Coefficiet of variatio First, we will discuss it for the case of raw data, ad the

More information

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor. Regressio, Part I I. Differece from correlatio. II. Basic idea: A) Correlatio describes the relatioship betwee two variables, where either is idepedet or a predictor. - I correlatio, it would be irrelevat

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Simple Linear Regression

Simple Linear Regression Simple Liear Regressio 1. Model ad Parameter Estimatio (a) Suppose our data cosist of a collectio of pairs (x i, y i ), where x i is a observed value of variable X ad y i is the correspodig observatio

More information

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics BIOTAT 640 Itermediate Biostatistics Frequetly Asked Questios Topic FAQ Review of BIOTAT 540 Itroductory Biostatistics. I m cofused about the jargo ad otatio, especially populatio versus sample. Could

More information

(X i X)(Y i Y ) = 1 n

(X i X)(Y i Y ) = 1 n L I N E A R R E G R E S S I O N 10 I Chapter 6 we discussed the cocepts of covariace ad correlatio two ways of measurig the extet to which two radom variables, X ad Y were related to each other. I may

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Regression and Correlation

Regression and Correlation 43 Cotets Regressio ad Correlatio 43.1 Regressio 43. Correlatio 17 Learig outcomes You will lear how to explore relatioships betwee variables ad how to measure the stregth of such relatioships. You should

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Lecture 11 Simple Linear Regression

Lecture 11 Simple Linear Regression Lecture 11 Simple Liear Regressio Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech Midterm 2 mea: 91.2 media: 93.75 std: 6.5 2 Meddicorp

More information

MA Advanced Econometrics: Properties of Least Squares Estimators

MA Advanced Econometrics: Properties of Least Squares Estimators MA Advaced Ecoometrics: Properties of Least Squares Estimators Karl Whela School of Ecoomics, UCD February 5, 20 Karl Whela UCD Least Squares Estimators February 5, 20 / 5 Part I Least Squares: Some Fiite-Sample

More information

Economics 326 Methods of Empirical Research in Economics. Lecture 8: Multiple regression model

Economics 326 Methods of Empirical Research in Economics. Lecture 8: Multiple regression model Ecoomics 326 Methods of Empirical Research i Ecoomics Lecture 8: Multiple regressio model Hiro Kasahara Uiversity of British Columbia December 24, 2014 Why we eed a multiple regressio model I There are

More information

MEASURES OF DISPERSION (VARIABILITY)

MEASURES OF DISPERSION (VARIABILITY) POLI 300 Hadout #7 N. R. Miller MEASURES OF DISPERSION (VARIABILITY) While measures of cetral tedecy idicate what value of a variable is (i oe sese or other, e.g., mode, media, mea), average or cetral

More information

Midterm 2 ECO3151. Winter 2012

Midterm 2 ECO3151. Winter 2012 Name: Studet Number: Midterm 2 ECO3151 Witer 2012 Istructios: 1. Prit your ame ad studet umber at the top of this midterm 2. No programmable calculators 3. You ca aswer i pecil or pe 4. This midterm cosists

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials Math 60 www.timetodare.com 3. Properties of Divisio 3.3 Zeros of Polyomials 3.4 Complex ad Ratioal Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered

More information

Mathematical Induction

Mathematical Induction Mathematical Iductio Itroductio Mathematical iductio, or just iductio, is a proof techique. Suppose that for every atural umber, P() is a statemet. We wish to show that all statemets P() are true. I a

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Radom Samplig The basic idea of the statistical iferece is that we are allowed to draw ifereces or coclusios about a populatio based

More information

Chapter 10: Power Series

Chapter 10: Power Series Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

More information

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 4

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 4 Itroductio to Ecoometrics (3 rd Updated Editio) by James H. Stock ad Mark W. Watso Solutios to Odd- Numbered Ed- of- Chapter Exercises: Chapter 4 (This versio August 7, 204) 205 Pearso Educatio, Ic. Stock/Watso

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Correlation and Covariance

Correlation and Covariance Correlatio ad Covariace Tom Ilveto FREC 9 What is Next? Correlatio ad Regressio Regressio We specify a depedet variable as a liear fuctio of oe or more idepedet variables, based o co-variace Regressio

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

x a x a Lecture 2 Series (See Chapter 1 in Boas)

x a x a Lecture 2 Series (See Chapter 1 in Boas) Lecture Series (See Chapter i Boas) A basic ad very powerful (if pedestria, recall we are lazy AD smart) way to solve ay differetial (or itegral) equatio is via a series expasio of the correspodig solutio

More information

MA131 - Analysis 1. Workbook 2 Sequences I

MA131 - Analysis 1. Workbook 2 Sequences I MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information