1 Social Studie 201 Note for March 18, 2005 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the ditribution of ample mean. In thi cae, if certain aumption are made, the t-ditribution can be ued to decribe the ditribution of ample mean. From thi, an interval etimate of the population mean µ can be contructed. The t-ditribution The t-ditribution i ometime referred to a Student t-ditribution. A table of the t-ditribution i contained in Appendix I, p. 911, of the text. Thi ditribution ha a hape that i very imilar to that of the normal ditribution, and ha the ame interpretation and ue a the normal ditribution in that i i ymmetrical about the centre, peaked in the centre, and trailing off toward the horizontal axi in each direction from centre. The t-ditribution ha a mean of 0 and a tandard deviation of 1 the ame a in the cae of a tandardized normal ditribution. In the ame way that Z-value are poition along the X-axi, o the t-value are meaured along the horizontal or X-axi. Since the mean i 0 and the tandard deviation i 1, the t-value aociated with each point i alo the number of tandard deviation. For example, a t-value of 1.50 i aociated with a point on the horizontal axi 1.5 tandard deviation to the right of centre. One difference between the t and normal ditribution i that the t- ditribution i a little more pread out than the normal. One way of picturing the t ditribution i to imagine taking the normal ditribution and tretching it to the left and right. See the diagram on p. 503 of the text, where one ditribution i uperimpoed on the other. Degree of freedom. Another difference between the t and normal ditribution, i that there i a different t-ditribution for each degree of freedom (df) a new concept that i related to ample ize. A a concept, degree of freedom i a little difficult
Etimate of mean, mall ample ize March 18, 2005 2 to explain at thi tage of the coure it refer to how many ample value are free to vary and how many are contrained. In the cae of etimation of a mean, there are n 1 degree of freedom, one le than the ample ize of n. When etimating a mean from n ample value, any n 1 value are free to vary but one value i fixed or contrained, by the fact that a particular value of a mean mut reult. If you find thi confuing, for now jut accept that in etimating the mean the degree of freedom i the ample ize minu one, that i, df = n 1. To return to the t-ditribution, when there are few degree of freedom, the ditribution i very dipered. For example, when the degree of freedom are only 4, the middle 95% of the t-ditribution require including the area from -2.776 to +2.776. Thi i in contrat to the correponding Z-value of ±1.96, from the normal ditribution. But a the number of degree of freedom increae, the t-ditribution approache the normal ditribution. Going down the column of the t-table (p. 911) aociated with the 95% confidence level, if there i a ample ize of 25, meaning df = n 1 = 25 1 = 24 degree of freedom, the t-value i 2.064, coniderably le than the 2.776 for 4 degree of freedom. A the ample ize, and the correponding degree of freedom, become larger, the t-ditribution actually approache the normal ditribution. To ee thi, examine the lat row of the t-table. For a very large degree of freedom (labelled infinite), the t-value for 95% confidence i 1.96, exactly the ame a the correponding Z-value from the table of the normal ditribution. For mot purpoe, when the ample ize reache 30, we ue the normal ditribution. For 29 degree of freedom and 95% confidence, the t-value i 2.045, not much larger than the 1.96 aociated with the normal ditribution. The table of the t-ditribution on p. 911 lit variou confidence level acro the top. You are thu retricted to obtaining confidence interval for the confidence level lited there. But the table provide the t-value aociated with common confidence level uch a 80%, 90%, 95%, and 99%. To ue the table, pick the proper confidence level and the aociated degree of freedom (ample ize minu 1) and the t-value in the table provide the aociated area under the t-ditribution between the t-value and the negative of that t-value.
Etimate of mean, mall ample ize March 18, 2005 3 Ditribution of the ample mean, mall n See text, p. 507. Under certain aumption, the t-ditribution can be ued to obtain interval etimate for the mean of certain ditribution. Thi ection outline the condition for thi. Strictly peaking, the t-ditribution can only be ued if ample i drawn from a normally ditributed population. That i, if a reearcher ha ome aurance that the characteritic of the population being examined i normally ditributed, then mall random ample from thi population have a t-ditribution. Thi reult can be tated a follow. Suppoe a normally ditributed population ha a mean of µ and a tandard deviation of σ. If random ample of mall ample ize n (le than 30 cae) are drawn from thi population, then the ample mean X of thee ample have a t-ditribution with mean µ, tandard deviation / n, and n 1 degree of freedom, where i the tandard deviation obtained from the ample. Thi can be tated ymbolically. If X i Nor (µ, σ) where µ and σ are unknown, and if random ample of ize n are drawn from thi population, ) X i t d (µ,. n where d = n 1 i the degree of freedom and X and are the mean and tandard deviation, repectively, from the ample. When the ample ize n i mall, ay le than n = 30, the t ditribution hould be ued. If n > 30, then the t value become o cloe to the tandardized normal value Z that the Central Limit Theorem can be ued to decribe the ampling ditribution of X. That i, t Z a n. Thi mean that the t ditribution i likely to be ued only when the ample ize i mall. For larger ample ize, X may till have a t ditribution, but if
Etimate of mean, mall ample ize March 18, 2005 4 the ample ize i large enough, the normal value are o cloe to the t value that the normal value are ordinarily ued. There are two aumption aociated with thi reult. 1. A i the cae with larger ample ize, the ample are to be random ample from the population. If the ample are not random ample, it i difficult to determine what the ditribution of the ample mean might be. 2. Unlike the central limit theorem, which generally hold regardle of the nature of the ditribution of the variable, the t-ditribution require ampling from a normally ditributed population. Thi i quite a retrictive aumption, ince few population are likely to be exactly normally ditributed. In practice, the t-ditribution i often ued even when there i no aurance that the ample i drawn from a normally ditributed population. If a reearcher conider the population to be very different than normally ditributed, perhap the t-ditribution hould not be ued. But if the population i not ditributed all that differently from a normal ditribution, little error may be introduced by uing the t-ditribution. I generally argue that, in the cae of ample ize le than 30, it i alway better to ue the t-ditribution than the normal ditribution, when conducting interval etimate or hypothei tet. The reaon for thi i that the t-ditribution i more dipered than the normal ditribution, o it give a better picture of the preciion of the ample. If a normal ditribution i ued, interval etimate may be reported a narrower than they really are in practice. By uing the t-ditribution, a reearcher i le likely to make reult look more precie than they really are. Interval etimate for the mean mall ample ize The t ditribution for X can be ued to obtain interval etimate of a population mean µ. The method i the ame for thi mall ample method a it i for the large ample method. That i, the ame erie of five tep can be ued. If the population mean µ i to be etimated, and the ample i a random ample of ize n with ample mean X and ample tandard deviation, and
Etimate of mean, mall ample ize March 18, 2005 5 if the population from which thi ample i drawn i normally ditributed, then ) X i t d (µ,. n where d = n 1. In order to obtain an interval etimate, the reearcher pick a confidence level C% and ue thi to determine the appropriate t-value from the t-table on p. 911. For d degree of freedom, let t d be the t value uch that C% of the area under the t curve lie between t d and t d. The C% confidence interval i then X ± t d n or in interval form, ( ) X t d, X + td. n n Note that thi i the ame formula a for the confidence interval when the ample ize i large the only difference i that t d replace the Z-value. Note that the ample tandard deviation i ued in the formula, rather than σ. The latter wa ued when preenting the formula for the interval etimate in the cae of the large ample ize. But even in the cae of a large ample ize, σ i almot alway unknown, o that in practice i ued a an etimate of σ in that formula. The interpretation of the confidence interval etimate i alo the ame a earlier. That i, C% of the the interval X ± t d n contain µ if random ample of ize n are drawn from the population, where d = n 1. Any pecific interval which i contructed will either contain µ or it will not contain µ, but the reearcher can be confident that C% of thee interval will be wide enough o that µ will be in the interval.
Etimate of mean, mall ample ize March 18, 2005 6 Example wage of worker after plant hutdown In Bringing Globalization Down to Earth: Retructuring and Labour in Rural Communitie in the Augut, 1995 iue of the Canadian Review of Sociology and Anthropology, the author Belinda Leach and Anthony Winon examine change in wage of worker after a plant hutdown. Before hutdown, mean male wage were $13.76 per hour and mean female wage were $11.80 per hour. After hutdown, ome of the worker found new job and the data from mall ample of uch worker i contained in Table 1. Uing data in thi table, obtain 95% interval etimate for the true mean wage of (i) male worker after the plant hutdown, and (ii) female worker after the plant hutdown. From thee interval etimate comment on whether there i trong evidence that the wage of male and female worker have declined. Table 1: Data on Hourly Wage of Worker with Job, After Plant Shutdown Type of Hourly Wage in Dollar Sample Worker Mean St. Dev. Size Male 12.20 3.27 12 Female 8.11 3.53 12 Anwer For the firt part, the parameter to be etimated i µ, the true mean wage for all male worker who lot job becaue of the plant hutdown. Organizing the anwer in term of the five tep involved in interval etimation (ee note of November 10), the anwer i a follow. 1. The ample mean X, tandard deviation, and ample ize n are given in Table 1. The ample ize of n = 12 i mall in thi cae, o it will be neceary to ue the t-ditribution. 2. Auming the ditribution of wage of all male worker who lot job in the hutdown i a normal ditribution, the ditribution of X i a t-ditribution with mean µ and tandard deviation / n with n 1 =
Etimate of mean, mall ample ize March 18, 2005 7 12 1 = 11 degree of freedom. That i where d = n 1 = 11. X i t d (µ, ). n 3. From the quetion, the confidence level i C = 95%. 4. For 11 degree of freedom and 95% confidence level, the t-value i 2.201. 5. The interval are X ± t d n and uing value from thi ample, the interval are For X = 12.20, the interval i X ± 2.201 3.27 12 X ± 2.201 3.27 3.464 X ± (2.201 0.944) X ± 2.078 12.20 ± 2.078 Thu the 95% interval etimate for the true mean wage level for all male who have lot job becaue of the plant hutdown i ($10.12, $14.28). For the female worker, the ame tep yield the interval X ± 2.201 3.53 12 X ± 2.201 3.53 3.464 X ± (2.201 1.019) X ± 2.243
Etimate of mean, mall ample ize March 18, 2005 8 For X = 8.11, the interval i 8.11 ± 2.243 Thu the 95% interval etimate for the true mean wage level for all female who have lot job becaue of the plant hutdown i ($5.87, $10.35). Comment on reult. From the data in Table 1, the ample provide evidence that the hourly wage of both male and female worker ha declined ince the plant hutdown. The twelve male in the ample had a mean wage of $12.20 after the hutdown, $1.56 per hour le than the $13.76 they were earning prior to the hutdown. On average, the twelve female worker uffered a decline of $3.69 per hour, from $11.80 prior to the hutdown to $8.11 after the hutdown. The interval etimate provide fairly trong evidence that all female worker uffered a decline in hourly wage, while the evidence for a decline i not o clear in the cae of male. Conider firt the female interval etimate. There i 95% certainty that the ample mean from a ample of ize twelve yield differ from the true mean by no more than $2.24. In the cae of thi ample, the interval i from $5.87 to $10.35. While a reearcher cannot be certain thi interval contain the true mean hourly wage for female after the plant hutdown, it very likely doe. But thi interval lie well below the former mean hourly wage of $11.80 per hour. A a reult, it eem fairly certain that female wage after hutdown are lower than prior to the hutdown. In contrat, the decline in male wage wa le than that for female and the 95% etimate for thi ample yield an interval for the male that contain the previou mean pay of $13.76. Since the reearcher i relatively certain that the true mean male hourly wage i in the interval from $10.12 to $14.28, it i poible that the true mean for all male worker i around the former mean hourly wage of $13.76. While the ample mean i le than the previou mean, it i not a lot le there i thu weak evidence for a decline of male hourly wage but the evidence i not a trong a in the female cae. The interval etimate do not provide direct tet of whether mean wage have changed or not that will be provided later in the ection on hypothei teting. But the reult of hypothei tet will be hown to be conitent with the comment above that i, there i evidence that female wage declined but inufficient evidence to prove that male wage declined.
Etimate of mean, mall ample ize March 18, 2005 9 Small and large ample ize Small ample generally reult in fairly wide confidence interval, thu providing le precie etimate of the mean than do larger ample. Comparing the interval for mall ample X ± t d n with thoe from larger ample X ± Z n there are two difference that contribute to thi. 1. If the ample ha a mall ample ize, the t-ditribution mut be ued, rather than the normal ditribution. A noted earlier, and a can be een by comparing the t-value with the correponding normal value, t-value are alway larger than the correponding Z-value for any given confidence level. The ± ection of the interval i thu larger for mall n, ince the larger t-value, rather than the maller Z-value, mut be ued. 2. When the ample ize i maller, the quare root of the ample ize i alo maller. Since the quare root of the ample ize i maller in thi cae, the denominator of / n i maller, o the fraction a a whole i larger. Table 2 illutrate the difference between a mall and a large ample ize for ample from a hypothetical population. A in the cae of the lat example, a ample tandard deviation of = 12 i hypotheized for each of thee ample. For the ample of ize 16, the interval are X ± 6.4, while for the ample of ize 256, the interval are X ± 1.3. From the table, it can be een that thi large difference in interval width emerge from the two factor mentioned above. For the mall ample, the t-value of 2.131 i larger than the Z-value of 1.96 for the larger ample. In addition, the tandard error, or tandard deviation of the ample mean i 3 in the cae of the mall ample and only 0.667 in the cae of the large ample.
Etimate of mean, mall ample ize March 18, 2005 10 Table 2: Preciion of etimate for 95% interval, mall ample of ize 16 and large ample of ize 256, with a ample tandard deviation of = 12 Size Characteritic of ample of Sample t(/ n) or ample ize (n) n / n Z(/ n) Small 16 4 12/4 = 3 2.131 3 = 6.393 Large 256 16 12/16 = 0.667 1.96 0.667 = 1.31 Given thee different ized interval that can emerge from different ample, the next ection i devoted to determining appropriate ample ize prior to the ample being elected. Lat edited March 18, 2005.