(X i X)(Y i Y ) = 1 n

Size: px
Start display at page:

Download "(X i X)(Y i Y ) = 1 n"

Transcription

1 L I N E A R R E G R E S S I O N 10 I Chapter 6 we discussed the cocepts of covariace ad correlatio two ways of measurig the extet to which two radom variables, X ad Y were related to each other. I may cases we would like to take this a step further ad try to use iformatio from oe variable to make predictios about the outcome of the other. For istace sample covariace ad correlatio We have so far cosidered summarizig a set of observatios where oe measuremet is made o each idividal or uit, but ofte i real-life radom experimets we make multiple measuremets o each idividual. For example, durig a health check-up a doctor might record the height, weight, age, sex, pulse rate, ad blood pressure. Just as we did for sigle measuremets, we ca represet the observed data by their empirical distributio, which is ow a fuctio of multiple argumets. For example, if we measure two radom variables (X i, Y i ) for the ith idividual (say weight ad blood pressure), the the empirical distributio fuctio is give by f(t, s) 1 #{X t, Y s}. We ca ow use this to estimate populatio features by the correspodig feature of the empirical distributio. For example, the populatio covariace Cov[X, Y ] E[(X E[X])(Y E[Y ])] E[XY ] E[X]E[Y ] gives a measure of how X ad Y relate to each other. The sample versio of this is the sample covariace S XY 1 1 (X i X)(Y i Y ) 1 i1 X i Y i X Y. (10.1.1) The sample correlatio coefficiet is defied similarly to populatio correlatio coefficiet ρ[x, Y ] as i1 r[x, Y ] S XY S X S Y, (10.1.2) where S X ad S Y are the sample stadard deviatios of X ad Y respectively. As with ρ[x, Y ], r[x, Y ] is bouded betwee 1 ad 1, ad is ivariat to scale ad locatio trasformatios, that is, for real umbers a, b, c, d, r[ax + b, cy + d] r[x, Y ] 10.2 simple liear model We will assume that the variable Y depeds o X i a liear fashio, but that it is also affected by radom factors. Specifically we will assume there is a regressio lie y α + βx ad that for give x-values X 1, X 2,..., X the correspodig y-values Y 1, Y 2,..., Y are give by Y j α + βx j + ɛ j, (10.2.1) for j 1, 2,..., ad where each of the ɛ j are idepedet radom variables with ɛ j Normal(0, σ 2 ). Equatio (10.2.1) is referred to as the simple liear model. I particular ɛ j are the (radom) vertical distace of the poit (X j, Y j ) from the regressio lie. For all results below we assume σ 2 > 0 is the variace of the errors, assumed to be the same for every data poit. We also assume that ot all of the X j quatities are the same so that the variace of these quatities is o-zero. I particular this meas Versio: April 25, 2016

2 242 liear regressio 10.3 the least squares lie The values of (X 1, Y 1 ),..., (X, Y ) are collected data. Though we assume that this data is produced via the simple liear model, we typically do ot kow the actual values of the slope β or the y-itercept α. The goal of this sectio is to illustrate a way to estimate these values from the data. For a lie y a + bx the residual of a data poit (X j, Y j ) is defied to be the quatity Y j (a + bx j ). This is the differece betwee the actual y-value of the data poit ad the locatio where the lie predicts the y-value should be. I other words, it may be viewed as the error of the lie whe attemptig to predict the y-value correspodig to the X j data poit. Amog all possible lies through the data, there is oe which miimizes the sum of these squared residual errors. This is called the least squares lie. Let (X 1, Y 1 ), (X 2, Y 2 ),..., (X, Y ) be poits o the plae. Suppose we wish to fid a lie that miimises the sum of squared residual errors. That is, let g : R 2 R be defied as g(a, b) [Y j (a + bx j )] 2. The objective is to miimize g. So usig calculus, From equatio (10.3.1) we have 0 [Y j a bx j ] 0 g a 2 ad 0 g b 2 Y j [Y j a bx j ] (10.3.1) X j [Y j a bx j ]. (10.3.2) a b X j Y a bx (Y (a + bx)) Therefore 1 Y a + bx, (10.3.3) which shows that the poit (X, Y ) must lie o the least squares lie. The poit (X, Y ) is kow as the poit of averages. Similarly from equatio (10.3.2), so that 0 X j [Y j a bx j ] (X j Y j ax j bxj 2 ) X j Y j ax + b X j Y j ax + b Xj 2. (10.3.4) We ow use the system of two equatios (give by (10.3.3) ad (10.3.4)) solve for a, b to get b ( X j Y j ) X Y ( (10.3.5) Xj 2) X2 X 2 j (10.3.6) 1 We shall use the otatio X, Y, S X, S Y, r[x, Y ](below), eve though they are ot ecessarily radom quatities. This is to simplify otatio ad will allow us to use kow properties, i the evet they are radom. Versio: April 25, 2016

3 10.3 the least squares lie 243 Recall that the sample variace of X 1, X 2,..., X is S 2 X 1 1 [ (X j X) 2 ] 1 1 [ Xj 2 2X j X + X 2 ] 1 1 [( Xj 2 ) 2X( X j ) + X 2 ] 1 1 [( Xj 2 ) 2X 2 + X 2 ] 1 1 [( Xj 2 ) X 2 ] Therefore, the deomiator of (10.3.5) is simply 2. The umerator may be writte more simply by usig the otatio of sample covairace ad correlatio defied i (10.1.1) ad (10.1.2). So from (10.3.5) we have b ( X j Y j ) X Y ( Xj 2) X2 ( 1)S XY ( 1)S 2 X r[x, Y ]S Y S X Usig the above ad (10.3.3), we also ow ca write a ice formula for a, which is a Y r[x, Y ]S Y S X X (10.3.7) By the above calcuatio we have show that the least squares lie miimizig the sum of the squared residual errors is the lie passig through the poit of averages (X, Y ) ad havig a slope equal to b r[x, Y ]S Y S X. We state this precisely i the Theorem below. Theorem Let (X 1, Y 1 ), (X 2, Y 2 ),..., (X, Y ) be give data poits. The the least squares lie passes through (X, Y ) ad has slope give by r[x, Y ]S Y S X. We illustrate the use of these formulas with two examples give below. Example Cosider the followig five data poits: X Y These poits are ot coliear, but suppose we wish to fid a lie that most closely approximates their tred i the least squares sese described above. Viewig these as samples, it is routie to calculate that the formulas above yield a 9.1 ad b 0.9. Of all of the lies i the plae, the oe that miimizes the sum of squared residual errors for the data set above is the lie y x. The R software also has a feature to perform a regressio directly. To obtai this result usig R we could first create vectors that represet the data: Versio: April 25, 2016

4 244 liear regressio > x <- c(3,4,5,6,7) > y <- c(6,5,6,4,2) Ad the istruct R to perform the regressio usig the commad lm idicatig the liear model. > lm(y x) The order of the variables i this commad is importat with this y x idicatig that the y variable is beig predicted usig the x variable as iput. The resultig output from R is (Iercept) x the values of the itercept ad slope of the least squares lie respectively. Example Suppose as part of a health study, a researcher collects data for weights ad heights of sixty adult me i a populatio. The average height of the me is 174 cm with a sample stadard deviatio of 8.0 cm. The average weight of the me is 78 kg with a sample stadard deviatio of 10 kg. The correlatio betwee the variables i the sample was This iformatio aloe is eough to fid the least squares lie for predictig weight from height. The reader may use the formulas above to verify that b ad a Therefore, amog all lies, y x is the oe which miimizes the sum of squared residuals. This does ot ecessarily mea this lie would be appropriate for predictig ew data poits. To make such a declaratio, we would wat to have some evidece that the two variables had a liear relatioship to begi with, but regardless of whether or ot the data was produced from a simple liear model, the lie above miimizes error i the least squares sese. exercises Ex Let (X 1, Y 1 ),... (X, Y ) be data produced via the simple liear model ad suppose y a + bx is the least squares lie for the data. Recall from above that the residual for ay give data poit is Y j (a + bx j ), the error the lie makes i predictig the correct y-value from the give x-value. Show that the sum of the residuals over all data poits must be zero. Ex Suppose that istead of usig the simple liear model, we assume the regressio lie is kow to pass through the origi. That is, the regressio lie has the from y βx ad for give x-values X 1, X 2,..., X the correspodig y-values Y 1, Y 2,..., Y are give by Y j βx j + ɛ j, (10.3.8) for j 1, 2,...,. As with the simple liear model, we assume each of the ɛ j are idepedet radom variables with ɛ j Normal(0, σ 2 ). (We will refer to this as the liear model though the origi ad will have several exercises ivestigatig how several formulas from this chapter would eed to be modified for such a model.) Assumig data (X 1, Y 1 ),... (X, Y ) was produced from the liear model through the origi, fid the least squares lie through the origi. That is, fid a formula for b such that the lie y bx miimizes the sum of squared residual errors a ad b as radom variables I this sectio (ad the remaider of this chapter) we will assume that (X 1, Y 1 ),..., (X, Y ) follow the simple liear model (10.2.1). I other words, there is a regressio lie y α + βx ad that for give x-values X 1, X 2,..., X the correspodig y-values Y 1, Y 2,..., Y are give by (10.2.1). I the previous sectio this data was used to produce a mea squared error-miimizig least squares lie y a + bx. I this sectio we ivestigate how well the radom quatities a ad b approximate the (ukow) values α ad β. Versio: April 25, 2016

5 10.4 a ad b as radom variables 245 Theorem Uder the assumptios of the simple liear model (10.2.1), the slope b of the least squares lie is a liear combiatio of the Y j variables. Further it has a ormal distributio with mea β ad σ variace 2. ( 1)S 2 X Proof - First recall that the X 1, X 2, X 3,... X are assumed to be determiistic, so will be treated as kow costats. The data poits Y 1, Y 2,..., Y are assumed to follow the simple liear model (10.2.1). So for j 1,...,, E[Y j ] E[α + βx j + ɛ j ] α + βx j + E[ɛ j ] α + βx j ad V ar[y j ] V ar[α + βx j + ɛ j ] V ar[ɛ j ] σ 2. Usig the formula, (10.3.5), we derived for b ad the above we have E[b] E ( X j Y j ) X Y ( 1)S 2 X 1 2 X j E[Y j ]) XE[Y ] 1 2 X j (α + βx j )) X(α + βx) 1 αx + β( 2 Xj 2 ) αx βx 2 β 2 Xj 2 ) X 2 β. Similarly, V ar[b] V ar ( X j Y j ) X Y ( 1)S 2 X 1 [ 2 X 2 ]2 j V ar[y j ]) 2 X 2 V ar[y ] 1 [ 2 X 2 ]2 j σ 2 ) 2 X 2 (σ 2 /) σ 2 [ 2 X 2 ]2 j ) X 2 σ 2 2. Versio: April 25, 2016

6 246 liear regressio The algebra below justifies that b is a liear combiatio of the Y j variables. b ( X j Y j ) X Y ( 1)S 2 X 1 [ ] 2 X j Y j ) ( XY j ) Xj X 2 Sice b is a liear combiatio of idepedet, ormal radom variables Y j, b itself is also a ormal radom variable (Theorem ). As oted above, the least squares lie ca be defied as the lie of slope b passig through the poit of averages. The followig lemma is a useful fact about how these quatities relate to each other. Lemma Let b be the slope of the least squares lie ad let Y be the sample average of the Y j variables. The b ad Y are idepedet. Proof - By Theorem , Y has a ormal distributio ad so does b by Theorem By Theorem 6.4.3, all we have ro show is that Y ad b are ucorrelated. Note that the Y j variables are all idepedet of each other ad so Cov[Y j, Y k ] will be zero if j k ad will equal the variace σ 2 otherwise. So, Cov[b, Y ] Cov X j X ( 1)S 2 X Y j, 1 [ Xj X Cov k1 2 Y j, 1 Y k k1 X j X ( 1)S 2 Cov[Y j, Y k ] k1 X X j X ( 1)S 2 σ 2 X σ 2 2 X j X 0. We coclude this sectio with a result o the distributio of a. Theorem Uder the assumptios of the simple liear model (10.2.1), The y-itercept a (give by (10.3.7) of the least squares lie is a liear combiatio of Y j variables. Further it has a ormal distributio with mea α ad variace σ 2 ( 1 + Proof- See Exercise X2 ). ( 1)S 2 X Y k ] Y j exercises Ex Prove Theorem (Hit: Make use of the fact that Y a + bx ad what has previously bee prove about Y ad b). Ex Show that, geerally speakig, a ad b are ot idepedet. Fid ecessary ad sufficiet coditios for whe the two varaibles are idepedet. Ex Show that a ad Y are ever idepedet. Ex Cotiuig from Exercise , assumig the regressio lie y βx passes through the origi ad b is the lest squares lie of the form y bx, do the followig: (a) Fid the expected value of b. Versio: April 25, 2016

7 10.5 predictig ew data whe σ 2 is kow 247 (b) Fid the variace of b. (c) Determie whether or ot b has a ormal distributio. (d) Determie if b ad Y are idepedet predictig ew data whe σ 2 is kow I this sectio we retur to questio of usig data for predictio. We cotiue to assume the simple liear model (10.2.1). We further assume that α ad β are estimated by a ad b( as calculated from the data (X 1, Y 1 ),..., (X, Y )) ad parameter σ 2 describig the variability of data aroud the regressio lie is a kow quatity. First suppose for a particular determiistic x-value X that we wat to use the data to estimate the correspodig y-value Y α + βx o the regressio lie by Y a + bx. Theorem The quatity Y a + bx has a ormal distributio with mea Y α + βx ad variace σ 2 ( 1 + (X X) 2 ). ( 1)S 2 X Proof - Recall from Theorem ad Theorem that a ad b are both liear combiatio of the radom variables Y j ormal distributio. So Y has ormal distributio by Theorem We eed to calculate oly its mea ad variace. The expected value is simple to calculate. E[Y ] E[a + bx ] E[a] + E[b]X α + βx Y If a ad b were idepedet, the calculatig the variace of Y would also be a simple task, but this is typically this is ot the case. However, from Lemma , we kow that b ad Y are idepedet. To make use of this, usig (10.3.3), we may rewrite the lie i poit-slope form aroud the poit of averages: Y Y + b(x X). From this we have, V ar[y ] V ar[y + b(x X)] V ar[y ] + V ar[b](x X) 2 σ2 + σ 2 2 (X X) 2 ( ) σ (X X) 2 2. Note that for various values of X this variace is miimal whe X is X, the average value of the x-data. I this case V ar[y ] σ2 V ar[y ] as expected. The further X is from the average of the x-values, the more variace there is i predictig the poit o the regressio lie. Next suppose that, istead of tryig to estimate a poit o the regressio lie, we are tryig to predict a ew data poit produced from the liear model. Let X ow represet the x-value of some ew data poit ad let Y α + βx + ɛ where ɛ Normal(0, σ 2 ) where the radom variable ɛ is assumed to be idepedet of all prior ɛ j which produced the origial data set. The followig theorem addresses the distributio of the predictive error made whe estimatig Y by the quatity Y a + bx. Theorem If (X, Y ) is a ew data poit, as described i the previous paragraph, the the predictive error i estimatig Y usig the least square lie is (a + bx ) Y which is ormally distributed with mea 0 ad variace σ 2 ( (X X) 2 ). ( 1)S 2 X Proof - The expected value of the predictive error is zero sice E[(a + bx ) Y ] E[a] + E[b]X E[α + βx + ɛ ] α + βx α βx E[ɛ ] 0. Versio: April 25, 2016

8 248 liear regressio Both quatities a ad b are liear combiatios of the Y j variables ad so a + bx Y a + bx α βx ɛ ( α βx ) + a liear combiatio of Y 1, Y 2,..., Y, ɛ. All ( + 1) of the variables, Y 1, Y 2,..., Y, ɛ, are idepedet ad have a ormal distributio. As ( α βx ) is a costat, from the above (a + bx Y ) has a ormal distributio. Fially, to calculate the varaice, we agai rewrite a + bx i poit-slope form ad exploit idepedece. V ar[(a + bx Y )] V ar[(y + b(x X) (α + βx + ɛ )] V ar[y ] + V ar[b](x X) 2 + V ar[ɛ ] σ2 + σ 2 2 (X X) 2 + σ 2 ( ) σ (X X) 2 2. Example A mathematics professor at a large uiversity is studyig the relatioship betwee scores o a preparatio assessmet quiz studets take o the first day of class ad their actual pecetage score at the ed of class. Assumig the simple liear model with σ 6, he takes a radom sample of 30 studets ad discovers their average score o the quiz is X 54 with a sample stadard deviatio of S X 12, while the averae percetage score i the class is Y 68 with a sample stadard deviatio of S Y 10. The sample correlatio is r[x, Y ] 0.6. So accordig to the results above, the least squares lie for predictig the course percetage from the prelimiary quiz will be y 0.5x If we wish to use the lie to predict the course percetage for someoe who scores a 54 o the prelimiary quiz, we would fid y 0.5(54) , as expected sice somoe who gets a average score o the quiz is likely to get aroud the average percetage i the class. Similiarly if we wish to use the lie to predice the course percetage for someoe who scores a 80 o the prelimiary quiz, we would fid y 0.5(80) Also ot surprisig. Due to the positive correlatio, a studet scorig above average o the quiz is also likely to score higher i the course as well. The previous theorem allows us to go further a calculate a stadard deviatio associated with these estimates. For the studet who scores a 54 o the prelimiary quiz, let Y be the actual course percetage ad let a + bx 68 be the least squares lie estimate we made above. The, V ar[a + bx Y ] 36( ) ad so the stadard deviatio i the predictive error is SD[a + bx Y ] 6.1. This meas that studets who make a average score of 54 o the prelimiary quiz will have a rage of percetages i the course. This rage will have a ormal distributio with mea 68 ad stadard deviatio 6.1. We could the use ormal curve computatios to make further predictios about how likely such a studet may be to reach a certai bechmark. Next take the example of a studet who scores 80 o the prelimiary quiz. The least squares lie predicts the course percetage for such a studet will be a + bx 81, but ow V ar[a + bx Y ] 36( (80 54) ) 43.0 ad so SD[a + bx Y ] 6.6. Studet who score a 80 o the prelimiary exam will have a rage of course percetages with a ormal distributio of mea 81 ad stadard deviatio 6.6. Thikig of the stadard deviatio as the likely error associated with predictio this example suggests that predictios of data further from the mea will ted to have less accuracy tha predictios ear to the mea. This is true i the simple liear model ad will be explored i the exercises. Versio: April 25, 2016

9 10.6 hypothesis testig ad regressio 249 exercises Ex Usig the figures from Example do the followig. Two studets are selected idepedetly at radom. The first scored a 50 o the prelimiary quiz while the secod scored 60. Determie how likely it is that the studet who scored the lower grade o the quiz will score a higher percetage i the course. Ex Explai why V ar[a + bx Y ] is miimized whe X X hypothesis testig ad regressio As a ad b both have a ormal distriubtio uder the assumptio of the simple liear model, it is possible to perform tests of sigificace cocerig the values of α ad β. Of particular importace is a test with a ull hypothesis that β 0 ad a alterate hypothesis β 0. This is commoly called a test of utility. The reaso for this ame is that if β 0, the the simple liear model produces output values Y j α + ɛ j which do ot deped o the correspodig iput X j. Therefore kowig the value of X j should ot be at all helpful i predictig the correspodig Y j result. However, if β 0 the kowig X j should be at least somewhat useful i predictig Y j value. Example Suppose (X 1, Y 1 ),..., (X 16, Y 16 ) follows the simple liear model with σ 5 ad produces a least squares lie y x. Suppose the sample average of the X j data is 20 ad the sample variace is SX What is the coclusio of a test of utility at a sigificace level of α 0.05? From the give least squares lie, b 1.1. As oted above, a test of utility compares a ull hypothesis that β 0 to a alterate hypothesis β 0, so this will be a two-tailed test. If the ull were true, the E[b] 0 ad we ca use the ormal distributio to determie whether the 1.1 value is so far from zero that the ull seems ureasoable. Usig the same sample mimicig idea itroduced i Chapter 9 we let Z 1,..., Z 16 be radom variables produced from X 1,... X 16 via the simple liear model. From Theorem , the slope of the least squares lie for the (X 1, Z 1 ),..., (X 16, Z 16 ) data has a ormal distributio with mea β 0 ad variace σ 2 ( 1)S 2 X 1 6. Therefore we ca calculate P ( slope of the least squares lie 1.1) P ( Z 1.1 1/6 ) 2P (Z < 1.1 1/6 ) where Z Normal(0, 1). As this P-value is less tha the sigificace level, the test rejects the ull hypothesis. That is, the test cocludes that the slope of 1.1 is far eough from 0 that it demostrates a true relatioship betwee the X j iput values ad the Y j output values. exercises Ex Cotiuig with Example , use Theorem to devise a hypothesis test for determiig whether or ot the regressio lie goes through the origi. That is, determie whether or ot α 0 is a plausible assumptio estimaig a ukow σ 2 I may cases the variace σ 2 of the poits aroud the regressio lie will be a ukow quatity ad so, like α ad β, it too will eed to be approximted usig the (X 1, Y 1 ),..., (X, Y ) data. The followig theorem provides a ubiased estimator for σ usig the data. Theorem Let (X 1, Y 1 ),..., (X, Y ) be data followig the simple liear model with > 2. Let S (Y j (a + bx j )) 2. The S 2 is a ubiased estimator for σ 2. (That is, E[S 2 ] σ 2 ). Versio: April 25, 2016

10 250 liear regressio Proof - Before lookig at E[S 2 ] i its etirety, we look at three quatities that will be helpful i computig this expected value. First ote, V ar[(y j Y )] V ar[ Y j + (Y 1 + Y Y ) ] 1 V ar[( 1)Yj 2 + Y i ] 1 2 [( 1) 2 σ 2 + i1,i j i1,i j 1 2 [( 1)2 σ 2 + ( 1)σ 2 ] 1 σ2 σ 2 ] ad therefore, E[(Y j Y ) 2 ] V ar[y j Y ] + (E[Y j Y ]) 2 1 σ2 + ((α + βx j ) (α + βx)) 2 1 σ2 + β 2 (X j X) 2 ( 1)σ 2 + β 2 (X j X) 2 ( 1)σ 2 + β 2 ( 1)S 2 X. (10.7.1) Next, E[b 2 (X j X) 2 ] E[b 2 ] (X j X) 2 (V ar[b] + (E[b]) 2 )() 2 σ 2 ( 2 + β 2 )() 2 σ 2 + β 2. 2 (10.7.2) Versio: April 25, 2016

11 10.7 estimaig a ukow σ Also, from which we may determie that E[(Y j Y )b(x j X)] E[bY j ] Cov[b, Y j ] + E[b]E[Y j ] X Cov[ i X ( 1)S 2 Y i, Y j ] + β(α + βx j ) i1 X X i X ( 1)S 2 Cov[Y i, Y j ] + β(α + βx j ) i1 X X i X 2 V ar[y j ] + β(α + βx j ) X i X 2 σ 2 + β(α + βx j ) (X j X)E[Y j b] (X j X)E[Y b] (X j X)( X i X ( 1)S 2 σ 2 + β(α + βx j )) (X j X)E[Y ]E[b] X (X i X) 2 ( 1)S 2 σ 2 + (X j X)β(α + βx j ) (X j X)(α + βx)β X σ 2 + (X j X)β 2 (X j X) σ 2 + β 2 ( 1)S 2 X (10.7.3) Fially, puttig together the results from equatios , , ad we fid E[ (Y j (a + bx j )) 2 ] E[ (Y j (Y + b(x j X)))] Hece E[S 2 X ] E[ 1 2 E[ E[ ((Y j Y ) b(x j X)) 2 ] (Y j Y ) 2 2(Y j Y )b(x j X) + b 2 (X j X) 2 ] E[(Y j Y ) 2 ] 2E[(Y j Y )b(x j X)] + E[b 2 (X j X) 2 ] ( ( 1)σ 2 + β 2 ) 2 ( 2 σ 2 + β 2 ) 2 ( + σ 2 + β 2 2 ) ( 2)σ 2 (Y j (a + bx j )) 2 ] σ 2 as desired. Versio: April 25, 2016

12 252 liear regressio Versio: April 25, 2016

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments: Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal

More information

Statistical Properties of OLS estimators

Statistical Properties of OLS estimators 1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of

More information

Stat 139 Homework 7 Solutions, Fall 2015

Stat 139 Homework 7 Solutions, Fall 2015 Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N. 3/3/04 CDS M Phil Old Least Squares (OLS) Vijayamohaa Pillai N CDS M Phil Vijayamoha CDS M Phil Vijayamoha Types of Relatioships Oly oe idepedet variable, Relatioship betwee ad is Liear relatioships Curviliear

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is: PROBABILITY FUNCTIONS A radom variable X has a probabilit associated with each of its possible values. The probabilit is termed a discrete probabilit if X ca assume ol discrete values, or X = x, x, x 3,,

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences. Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Correlation Regression

Correlation Regression Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

The Random Walk For Dummies

The Random Walk For Dummies The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli

More information

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Radom Samplig The basic idea of the statistical iferece is that we are allowed to draw ifereces or coclusios about a populatio based

More information

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13 BHW # /5 ENGR Probabilistic Aalysis Beautiful Homework # Three differet roads feed ito a particular freeway etrace. Suppose that durig a fixed time period, the umber of cars comig from each road oto the

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Chapter 13: Tests of Hypothesis Section 13.1 Introduction Chapter 13: Tests of Hypothesis Sectio 13.1 Itroductio RECAP: Chapter 1 discussed the Likelihood Ratio Method as a geeral approach to fid good test procedures. Testig for the Normal Mea Example, discussed

More information

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram. Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Lesson 11: Simple Linear Regression

Lesson 11: Simple Linear Regression Lesso 11: Simple Liear Regressio Ka-fu WONG December 2, 2004 I previous lessos, we have covered maily about the estimatio of populatio mea (or expected value) ad its iferece. Sometimes we are iterested

More information

Final Examination Solutions 17/6/2010

Final Examination Solutions 17/6/2010 The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Statistics 20: Final Exam Solutions Summer Session 2007

Statistics 20: Final Exam Solutions Summer Session 2007 1. 20 poits Testig for Diabetes. Statistics 20: Fial Exam Solutios Summer Sessio 2007 (a) 3 poits Give estimates for the sesitivity of Test I ad of Test II. Solutio: 156 patiets out of total 223 patiets

More information

Section 14. Simple linear regression.

Section 14. Simple linear regression. Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Power and Type II Error

Power and Type II Error Statistical Methods I (EXST 7005) Page 57 Power ad Type II Error Sice we do't actually kow the value of the true mea (or we would't be hypothesizig somethig else), we caot kow i practice the type II error

More information

GG313 GEOLOGICAL DATA ANALYSIS

GG313 GEOLOGICAL DATA ANALYSIS GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data

More information

Statisticians use the word population to refer the total number of (potential) observations under consideration

Statisticians use the word population to refer the total number of (potential) observations under consideration 6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space

More information

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE TERRY SOO Abstract These otes are adapted from whe I taught Math 526 ad meat to give a quick itroductio to cofidece

More information

Asymptotic Results for the Linear Regression Model

Asymptotic Results for the Linear Regression Model Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics 8.2 Testig a Proportio Math 1 Itroductory Statistics Professor B. Abrego Lecture 15 Sectios 8.2 People ofte make decisios with data by comparig the results from a sample to some predetermied stadard. These

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9 Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n. ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic

More information

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}. 1 (*) If a lot of the data is far from the mea, the may of the (x j x) 2 terms will be quite large, so the mea of these terms will be large ad the SD of the data will be large. (*) I particular, outliers

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS PART of UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF () The sample media is greater tha the sample mea whe there is. (B) () A radom variable is ormally distributed

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still

More information

Common Large/Small Sample Tests 1/55

Common Large/Small Sample Tests 1/55 Commo Large/Small Sample Tests 1/55 Test of Hypothesis for the Mea (σ Kow) Covert sample result ( x) to a z value Hypothesis Tests for µ Cosider the test H :μ = μ H 1 :μ > μ σ Kow (Assume the populatio

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9 BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous

More information

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense, 3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Ismor Fischer, 1/11/

Ismor Fischer, 1/11/ Ismor Fischer, //04 7.4-7.4 Problems. I Problem 4.4/9, it was show that importat relatios exist betwee populatio meas, variaces, ad covariace. Specifically, we have the formulas that appear below left.

More information

University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions

University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 00C Istructor: Nicolas Christou EXERCISE Aswer the followig questios: Practice problems - simple regressio - solutios a Suppose y,

More information

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio

More information

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators. IE 330 Seat # Ope book ad otes 120 miutes Cover page ad six pages of exam No calculators Score Fial Exam (example) Schmeiser Ope book ad otes No calculator 120 miutes 1 True or false (for each, 2 poits

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notatio Math 113 - Itroductio to Applied Statistics Name : Use Word or WordPerfect to recreate the followig documets. Each article is worth 10 poits ad ca be prited ad give to the istructor

More information

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman: Math 224 Fall 2017 Homework 4 Drew Armstrog Problems from 9th editio of Probability ad Statistical Iferece by Hogg, Tais ad Zimmerma: Sectio 2.3, Exercises 16(a,d),18. Sectio 2.4, Exercises 13, 14. Sectio

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual

More information

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion 1 Chapter 7 ad 8 Review for Exam Chapter 7 Estimates ad Sample Sizes 2 Defiitio Cofidece Iterval (or Iterval Estimate) a rage (or a iterval) of values used to estimate the true value of the populatio parameter

More information

Regression and Correlation

Regression and Correlation 43 Cotets Regressio ad Correlatio 43.1 Regressio 43. Correlatio 17 Learig outcomes You will lear how to explore relatioships betwee variables ad how to measure the stregth of such relatioships. You should

More information

Module 1 Fundamentals in statistics

Module 1 Fundamentals in statistics Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly

More information

Polynomial Functions and Their Graphs

Polynomial Functions and Their Graphs Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

(all terms are scalars).the minimization is clearer in sum notation:

(all terms are scalars).the minimization is clearer in sum notation: 7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1

More information

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis Sectio 9.2 Tests About a Populatio Proportio P H A N T O M S Parameters Hypothesis Assess Coditios Name the Test Test Statistic (Calculate) Obtai P value Make a decisio State coclusio Sectio 9.2 Tests

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Matrix Representation of Data in Experiment

Matrix Representation of Data in Experiment Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

Economics Spring 2015

Economics Spring 2015 1 Ecoomics 400 -- Sprig 015 /17/015 pp. 30-38; Ch. 7.1.4-7. New Stata Assigmet ad ew MyStatlab assigmet, both due Feb 4th Midterm Exam Thursday Feb 6th, Chapters 1-7 of Groeber text ad all relevat lectures

More information

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So, 0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical

More information

Describing the Relation between Two Variables

Describing the Relation between Two Variables Copyright 010 Pearso Educatio, Ic. Tables ad Formulas for Sulliva, Statistics: Iformed Decisios Usig Data 010 Pearso Educatio, Ic Chapter Orgaizig ad Summarizig Data Relative frequecy = frequecy sum of

More information