1 Ecoomics 400 -- Sprig 015 /17/015 pp. 30-38; Ch. 7.1.4-7. New Stata Assigmet ad ew MyStatlab assigmet, both due Feb 4th Midterm Exam Thursday Feb 6th, Chapters 1-7 of Groeber text ad all relevat lectures ad hadouts ad computer & book exercises. Brig Calculator & pecil with fuctioig eraser! Review Sessio 7:30 pm o Moday, Feb 3d, Garder 105 Samplig distributio of the sample mea, ormal populatio, sigma ukow (The Studet's t distributio) Usig the t-distributio whe the populatio is ot ormal Samplig Distributio of the Populatio Proportio Ed of Midterm 1 Material
/17/015 7 Samplig Distributio of x, Normal Populatio, Ukow We ca immediately use our estimator of sample variace to help us out of a problem that occurs whe we try to estimate the sample mea from a ormal populatio where the populatio stadard deviatio is ukow. It turs out that s is a good estimator of the populatio variace, oe ca prove that the expected value of s is equal to the populatio variace. I fact,. To do so we use the rules of expectatios that we developed earlier. Here s a hadout that shows you how to do this proof: {Next Slide} Proof that the Sample Variace is a Ubiased Estimator of the Populatio Variace Early i the course I claimed that the "best" estimator of the populatio variace, sigma-squared is s-squared defied as: ( xi x) i1 s, 1 ( xi x) ' i1 eve though it would seem that a better estimator would be: s. ' So, let x be a radom sample with ad. Show that is a biased estimator 1, x,, x E x i V xi s for ad s is a ubiased estimator for. First, with some basic algebra (which I'll leave to you) we ca demostrate that: xi x xi 1 xi i1 xi x i i i 1 1 1 ( ). The, we ca write the expected value of this sum of squared differeces as: E x x E x E x E x E x i1 i1 i1 ( ) ( ). i i i Notice that is the same for i = 1,,...,. We use this ad the fact that the variace of a radom variable is give by V x E x E x to coclude that E xi V xi E xi, E x V x E x /, ad that E ( xi x) i1 i1 1. It follows that ' 1 1 1 E s E ( xi x) 1 i1 s ' ad that is biased because E s '. However, 1 1 E s E ( xi x) 1 1 i1 1 so we see that s is a ubiased estimator for. The expected value of a differece is the differece of the expected values. Ad the expected value of a sum is the sum of the expected values. Now, we kow that whe we re estimatig a sample mea from a ormal populatio with a SampleDist.lwp Lecture o Samplig Distributios Page 8 of 37
/17/015 kow variace our estimator s distributio is exactly ormal. So, if we wat to calculate probabilities for a ormally distributed estimator of the sample mea we ca covert to z-scores: z x x What happes whe we substitute the sample stadard deviatio for sigma? z x x s Ituitively you ca guess that the mea of the stadardized variable z, is still zero sice the umerator has ot bee affected by the substitutio. I terms of the variace, we should expect the variace of x x s to be larger tha the variace of x x sice oe more elemet of ucertaity has bee added to the ratio. Fially, we should expect the ratio to be symmetrical, sice there is o reaso to believe that substitutig s for sigma will make this distributio skewed either positively or egatively. Note also that the variaiability of the distributio depeds upo the size of, for the sample size affects the reliability with which s estimates. Whe is large, s will be a good approximatio to ; but whe is small, s may ot be very close to. Hece, the distributio of SampleDist.lwp Lecture o Samplig Distributios Page 9 of 37
/17/015 x x is a family of distributios whose varibility depeds upo. s So, I hope that it s clear from this discussio that the distributio of x x s is ot ormal, but is more spread out tha ormal. The distributio of this statistic is called the t-distributio ad its radom variable is deoted as t x x s. This is the famous distributio that was discovered by a statisticia amed W. S. Gossett, a Irishma who worked for Guiess Brewery. The brewery would t let him publish his research so he published aoymously uder the ame of Studet. I hoor of Gosset s research, published i 1908, the t-distributio is ofte called Studet s t-distributio. The t-distributio is a fairly complex fuctio, ad I wo t preset it here. Let me list its characteristics: The t-distributio depeds upo the size of the sample. It is cusomary to describe the characteristics of the t-distributio i terms of the sample size mius oe, or (-1), as this quatity has special sigificace. The value of (-1) is called umber of degrees of freedom (abbreviated d.f.), ad represets a measure of the umber of observatios i the sample that ca be used to estimate the stadard deviatio of the paret populatio. For example, whe =1, there is o way to SampleDist.lwp Lecture o Samplig Distributios Page 30 of 37
/17/015 esxtimate the populatio stadard deviatio; hece there are o degrees of freedom (-1=0). There is oe degree of freedom i a sample of =, sice oe observatio is ow free to vary away from the other, ad the amout it varies determis our estimate of the populatio stadard deviatio. Each additioal observatio adds oe more degree of freedom, so that, i a sample of size, there are (-1) observatios free to vary, ad hece (-1) degrees of freedom. The Greek letter or u is ofte used to deote degrees of freedom. Whe sample sizes are small, the t-distributio is see to be cosiderably more spread out tha the stadard ormal distributio. That is, its tails are thicker: {ext slide} Comparig the t- ad ormal distributios desity 0.4 0.3 0. 0.1 t distributio (df = 3) Stadardized ormal distributio 0-3 - -1 0 1 3 z- ad t-values Here we compare a t-distributio with degrees of freedom = 3 to the stadard ormal distributio. You ca see that the t-distributio has cosiderably more area uder its tails outside of stadard SampleDist.lwp Lecture o Samplig Distributios Page 31 of 37
/17/015 deviatios; however, as degrees of freedom get large, the t-distributio approaches the ormal distributio. Because the t-distributio is really a family of distributios it would be very difficult to carry aroud tables for all possible t-distributios. Istead, tables are usually published that cotai probability values for certai critical values. Here s the t-table out of your textbook: {ext slide} The table cotais values of t that cotai a certai amout of area uder the curve to the right. So, with degrees of freedom = 1, t must equal 31.81 {Next Slide} to have oe percet ot total area to the right. O the other had t eeds oly be 3.747 if the degrees of freedom are equal to 4.{Next Slide} At degrees of freedom = 9 the t-value at oe percet is.46 which is very close to the z-value.36 of the ormal distributio for 1-percet right tail probability. d.f. t.100 t.050 t.05 t.010 t.005 d.f. 1 3 4 5 6 7 8 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 6.314.90.353.13.015 1.943 1.895 1.860 1.706 4.303 3.18.776.571.447.365.306 31.81 6.965 4.541 3.747 3.365 3.143.998.896 63.657 9.95 5.841 4.604 4.03 3.707 3.499 3.355 1 3 4 5 6 7 8 SampleDist.lwp Lecture o Samplig Distributios Page 3 of 37
/17/015 9 10 11 1 13 14 15 16 17 18 19 0 1 3 4 5 6 7 8 9 if. 1.383 1.37 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.38 1.35 1.33 1.31 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.8 1.833 1.81 1.796 1.78 1.771 1.761 1.753 1.746 1.740 1.734 1.79 1.75 1.71 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.645.6.8.01.179.160.145.131.10.110.101.093.086.080.074.069.064.060.056.05.048.045 1.960.81.764.718.681.650.64.60.583.567.55.539.58.518.508.500.49.485.479.473.467.46.36 3.50 3.169 3.106 3.055 3.01.977.947.91.898.878.861.845.831.819.807.797.787.779.771.763.756.576 9 10 11 1 13 14 15 16 17 18 19 0 1 3 4 5 6 7 8 9 if. 7.1. Example Usig the t-distributio {Next Slide} To see how to use the t-distributio, let s examie the widely publicized claims of a well-kow eighborig uiversity that its studets have I.Q. s which are ormally distributed with a mea 130. Suppose that I were able to obtai through, methods that I caot reveal, a radom sample of the I.Q. s of 5 studets. This radom sample has a mea of 16.8 ad a stadard deviatio of s=6. {Next Slide} <What is the probability of receivig a sample mea of 16.8, or lower, if 130? {Next Slide} First, covert the sample mea to a t-score assumig that the populatio mea really is equal to 130: {Next Slide} SampleDist.lwp Lecture o Samplig Distributios Page 33 of 37
/17/015 P x P x 16. 8 130 16. 8 s / 6 / 5 P t 3. Pt. 667 1. Now, sice the t-distributio is symmetrical, the probability that t.667 is equal to the probability that t.667. The degrees of freedom for this sample is {ext slide}: df 5 1 4. So, lookig at the table uder df=4 we do t fid a direct match, but we do see that.667 lies betwee.49 ad.797. So, the probability that we would get a sample mea of 16.8 or less, if the true mea were 130 is oly betwee oe ad oe-half percet! So, the probability that the true mea I.Q. of this uamed uiversity s studets is 130 is very low. {ext slide}. The ext slide shows that the area to the left of -.667 equals the area to the right of +.667, ad that this area is somewhere betwee 0.01 ad 0.005. {ext slide - clicks} 0.4 0.3 Studet's t-distributio of sample mea Deg. of freedom 4 desity 0. 0.1 0-4 -3 - -1 0 1 3 4 t-value These are the same areas Oh, by the way, we ve just doe some statistical iferece: We asked the questio, what s the probability of drawig a radom sample with mea 16.8 if the true populatio mea is 130? The aswer was: quite low. SampleDist.lwp Lecture o Samplig Distributios Page 34 of 37
/17/015 7.. Usig the t-distributio whe the populatio is ot Normal Now, let me emphasize agai that the t-distributio assumes that samples are draw from a paret populatio that is ormally distributed. I practical problems ivolvig this distributio, the questio is: just how critical is this assumptio of ormality i the paret populatio? Ofte, we ca t determie the distributio of the paret populatio, so it becomes difficult to kow if usig the t-distributio is appropriate. Fortuately, the assumptio of ormality ca be relaxed without sigificatly chagig the samplig distributio of the t-distributio. Because of this, the t-distributio is said to be quite robust, implyig that its usefuless holds up well uder coditios that do ot exactly coform to the origial ormality assumptio. So, let s agai emphasize several importat aspects of the samplig distributio of x whe is large: {ext slide} v Whe is large (>30) x will at a miimum be approximately ormally distributed. v Whe is large s will usually be a good approximatio to sigma. v I that case, the distributio of t (x )/s/ ad that of z (x )// will be approximately the same. v So, for large samples we ca use the stadard ormal distributio to approximate the t-distributio. 8 Samplig Distributio of the Sample Proportio {Next Slide} Let's say that we're doig a political poll about itetios of a radomly selected set of voters to vote for the curret presidet at the ext electio. The respodets to the poll will respod "yes" or "o" ad we wat to estimate the probability that the average voter will vote for the presidet. We ca approximate this ukow probability, p, with the sample proportio, SampleDist.lwp Lecture o Samplig Distributios Page 35 of 37
/17/015 {ext slide} x pˆ where p is the estimated probability, x is the umber of "yes" aswers i the sample ad is the size of the sample. That is, we estimate the uderlyig probability with the sample proportio. Sice each distict value of x results i a distict value of x pˆ the probabilities associated with are equal to the probabilities p associated with the correspodig values of x. Hece, the samplig distributio of will be the same shape as the biomial p probability distributio for x. Like the biomial probability distributio, it ca be approximated by a ormal probability distributio whe the sample size is large. Now, the expected value of the sample proportio is:{ext slide} x 1 1 E pˆ pˆ E E x p p ad the stadard error of the sample proportio, p, is {Next Slide} 1 (1 ) (1 ) ˆ x p p p p V p pˆ V V x ad, {ext slide} pˆ p(1 p) SampleDist.lwp Lecture o Samplig Distributios Page 36 of 37
/17/015 8.1. Example It's ot widely kow, but a substatial proportio of super market scaig machies make mistakes whe items are scaed i. The North Carolia Divisio of Weights ad Measures tests a store's scaers by radomly selectig 300 register tapes ad verifyig whether or ot there's a error o the tape. Stores are fied if the error rate is more tha percet. Suppose 8 tapes show errors; what's the probability of gettig 8 or more errors if the true error rate is, i fact, percet (or 6 errors)? Let's approximate the biomial distributio with a ormal distributio with: {ext slide} x 8 pˆ 0.0667 300 ad, {ext slide} pˆ 1 ) ˆ 1 ˆ) p p p p 0.00930 The, we calculate the z-value uder the assumptio that the true probability of error is {ext slide} p 0.0 ad we get {ext slide} pˆ p 0.0667 0.0 z 0.7170 pˆ 1 pˆ ) 0.00930 so, lookig at the ormal table we see that the probability of gettig SampleDist.lwp Lecture o Samplig Distributios Page 37 of 37
/17/015 a.7% error rate is (0.5-0.63 = 0.37) 3.7 percet, eve if the true probability of error is oly.0%: {ext slide} Stadardized ormal distributio 0.5-0.63 = 0.37-3 - -1 0 1 3 z = 0.7170 So, it's ot too ulikely that we'd get a error rate of.7% eve if the machies are operatig withi regulatory specs at.0%. SampleDist.lwp Lecture o Samplig Distributios Page 38 of 38