THE UNIVERSITY OF HONG KONG School of Economics & Finance Answer Keys to 2003-2004 1st Semester Examination Economics: ECON1003 Analysis of Economic Data Dr K F Wong 1. (6 points) State and explain briefly the Central Limit Theorem. For a sample drawn from a population with population mean µ and population variance σ 2, the Central Limit Theorem states that if (1) the sample size n is sufficiently large (n 30), then (2) the population of all possible sample means is approximately normally distributed (with (3) mean µ x = µ and (4) standard deviation σ x = σ / n ), (5) no matter what probability distribution describes the sample population. Furthermore, the larger the sample size n is, the more nearly normally distributed is the population of all possible sample means. 2. (6 points) A fast food restaurant located near HKU claims they will prepare your order in less than 60 seconds. An undercover reporter of HKU Campus Newspaper monitored 30 consecutive orders at the restaurant. The number of seconds to prepare orders is reported below. 88 24 44 62 52 44 60 52 36 56 24 80 34 26 28 34 50 58 30 60 20 56 32 66 48 40 58 68 46 26 a. (4 points) Develop a stem-and-leaf plot for the time to prepare an order. Stem and Leaf plot for # 1 stem unit = 10 leaf unit = 1 Frequency Stem Leaf 6 2 0 4 4 6 6 8 5 3 0 2 4 4 6 5 4 0 4 4 6 8 7 5 0 2 2 6 6 8 8 5 6 0 0 2 6 8 0 7 2 8 0 8 30
b. (2 points) Pretending yourself to be the reporter, summarize your findings in a brief report. Aside from some outliers (80 and 88), the time to prepare an order are distributed evenly from 20 to 70 seconds. In particular, 5 out of 30 (i.e., 16.7 %) observations (62, 66, 68, 80 and 88) are above the 60 seconds. We conclude that the fast food restaurant is not doing well in matching its pledge. 3. (5 points) The government is attempting to choose one of two sites (A or B) as the location for its new emergency facility. After the new emergency facility becomes available for service, the current emergency facility will be shut down. The project manager has estimated the following response times in minutes from each of the proposed sites to the four areas that must be served by the emergency facility. Area Served Proposed Site 1 2 3 4 A 5.2 4.4 3.6 6.5 B 6.0 7.4 3.4 4.0 The number of emergency runs from the current emergency facility to each of the four areas over the past year is as follows: Area 1 2 3 4 Number of runs 150 65 175 92 Compute the weighted mean response time from both proposed locations and determine which proposed site should be selected for the new emergency facility. Weighted mean response time for site A: µ A = [150*5.2 + 65*4.4 + 175*3.6 + 92*6.5] / [150+65+175+92] = 2294/482 = 4.76 min Weighted mean response time for site B: µ B = [150*6 + 65*7.4 + 175*3.4 + 92*4] / [150+65+175+92] = 2344/482 = 4.86 min Because site A has a shorter mean response time than site B, we should choose site A. 4. (10 points) An insurance company will insure the full value of damage of residential property against fire accident at a premium of $1000 per year. Suppose that there is 0.005 probability of a big fire accident with a damage of $50,000. There is 0.010 probability of a small fire accident with a damage of $4,000. Let x denote the insurance company s profit. a. (3 points) Set up the probability distribution of the random variable x. Let Z be the premium charged. Event P(event) Profit x Big fire 0.005 Z 50000 (= 49000 for Z=1000) Small fire 0.010 Z 4000 (= 3000 for Z=1000) No fire 0.985 Z (=1000 for Z=1000) b. (3 points) Calculate the insurance company s expected profit.
Expected profit = P(big fire) * Profit(big fire) + P(small fire) * Profit(small fire) + P(no fire)*profit(no fire) = 0.005 * (Z 50000) + 0.010*(Z 4000) + 0.985*Z = Z [0.005 * 50000 + 0.010*4000] = Z 290 = 710 for Z = 1000. c. (4 points) Find the premium that the insurance company should charge if it wants its expected profit to be $500. From (b), expected profit = Z 290. To get an expected profit of 500, we must charge Z = 500 +290 = 790. 5. (6 points) In one of our in-class experiments, we estimated the percentage of earth surface covered by water. We did that by randomly throwing an inflated globe 50 times and catching it by hand. For each catch, we checked whether a mark on our thumb landed on water of the inflated globe. After 50 random throws, we estimated that 30 percent of earth surface is covered by water. a. (3 points) Construct a 95% confidence interval for the percentage of earth covered by water. Let the sample proportion be p (0.3 in this question), sample size be n (50 in this question). Note that from the formula given (binomial), the variance for a throw (sample size =1) is p*(1-p). According to the central limit theorem, the sample proportion should be distributed with a mean equal to the population proportion (π) and the variance of π(1-π)/n. However, the sample variance can be computed only if p is fixed. A degree of freedom would have been lost. Thus, an unbiased variance estimator is S p 2 =p(1-p)/(n-1) = 0.00428. Thus, the 95% confidence interval is [p-1.96*sqrt(s p 2 ), p-1.96*sqrt(s p 2 )] =[0.3-1.96*0.065, 0.3+1.96*0.065] =[0.3-0.1283, 0.3+0.1283] =[0.1717,0.4283] b. (3 points) What is the probability of getting an estimate of 30 percent of earth surface covered by water when actually 70 percent of earth surface is covered by water? Under this null that actually 70 percent of earth surface is covered by water, the variance of p is σ p 2 =π(1-π)/n = 0.0042. That is, p is distributed normal with mean 0.7 and variance 0.0042. The probability of getting an estimate of 30 or less percent of earth surface covered by water = Prob(p<=0.3) = Prob((p-0.7)/0.0042 <=(0.3-0.7)/0.0042) = Prob(z <=-95.238) = 0. The probability of getting an estimate of exactly 30 percent of earth surface covered by water = Prob(p=0.3) = 0 (because p is a continuous variable). The alternative approach is to use binomial formula to compute the probability of getting 15 successes in 50 draws: P(x) = n C x π x (1-π) n-x P(x=3) = 50!/[15! 35!] * 0.7 15 (0.3) 35 = 0 6. (6 points) Four employees who work as take-out attendees at a local fast food restaurant are being evaluated. As a part of quality improvement initiative and employee evaluation
these workers were observed over three days. One of the statistics collected is the proportion of time employee forgets to include a napkin in the bag. Related information is given in the table below. Worker % of Dinners Packed % Forgot Napkin Peter 25% 6% Paul 20% 2% Mary 20% 10% Nancy 35% 4% You just purchased a dinner and found that there is no napkin in your bag, what is the probability that Mary has prepared your order? The probability that Mary has prepared your order and forgot to put a Napkin is P(Mary and forgot napkin) =0.2*0.1 = 0.02. The probability that no napkin is found in your bag is P(No napkin) = P(Peter and forgot napkin) + P(Paul and forgot napkin) + P(Mary and forgot napkin) + P(Nancy and forgot napkin) = 0.25*0.06 + 0.20*0.02 + 0.20*0.1 + 0.35*0.4 = 0.053 The probability that Mary has prepared your order given that there is no napkin in your bag is P(Mary no napkin) = P(Mary and forgot napkin) / P(no napkin) = 0.02/0.053 = 0.37774 7. (10 points) In Hong Kong, about 50 out of every 1000 children are gifted and talented. Children are classified as gifted and talented based on some IQ test. Currently in undergraduate admission in Hong Kong universities, students with six or more distinctions ( A grade) in Certificate Examination (a local public examination) are treated as gifted and talented and can enter university a year earlier than their peers. This test (whether the student has six or more distinctions in Certificate Examination) is not perfect. One out of every 1000 tests given to students who are gifted and talented comes out negative, that is, P(less than six distinctions in Certificate Examination gifted and talented) = 1/1000. Out of every 1000 people who are not gifted and talented, 50 of them test positive, that is, P(six or more distinctions in Certificate Examination not gifted and talented ) = 5/100. a. (4 points) Draw a tree diagram describing probability of these events.
999/1000 = 0.999 >= 6As 50/1000 = 0.05 GT 1/1000 = 0.001 < 6As 950/1000 = 0.95 Not GT 50/1000 = 0.05 >= 6As 950/1000 = 0.95 < 6As b. (3 points) Using your figures from your tree diagram find P(gifted and talented six or more distinctions in Certificate Examination), i.e., the probability of a student is gifted and talented when he/she gets six or more distinctions in Certificate Examination. P(GT >=6As ) = P(GT and >=6As)/P(>=6As). P(>=6As) = P(>=6As GT)*P(GT) + P(>=6As not GT)*P(not GT) = (0.999)*(0.05) + (0.05)*(0.95) = 0.09745. P(>=6As GT)*P(GT) = (0.999)*(0.05) = 0.04995 Hence, P(disease X + ) = P(disease and +)/P(+) = 0.04995/0.09745 = 0.5125. c. (3 points) Using your figures from your tree diagram find P(not gifted and talented less than six distinctions in Certificate Examination), i.e., the probability of a student is not gifted and talented when he/she gets less than six distinctions in Certificate Examination. P(not GT < 6As ) = P(not GT and <6As)/P(<6As). P(<6As) = P(<6As GT)*P(GT) + P(<6As not GT)*P(not GT) = (0.001)*(0.05) + (0.95) (0.95) = 0.90255. P(<6As not GT)*P(not GT) = (0.95) (0.95) = 0.9025. Hence, P(not GT < 6As ) = P(not GT and <6As)/P(<6As) = 0.9025/0.90255 = 0.9999446. 8. (8 points) Consider five independent identically distributed random variables X1, X2, X3, X4 and X5. They all have unit mean and unit variance. That is, E(X1) = E(X2) = E(X3) = E(X4) = E(X5) = 1, and V(X1) = V(X2) = V(X3) = V(X4) = V(X5) = 1. Compute the mean and variance of a new random variable Y=(X1+X2+X3+X4+X5)/5. E(Y) = E[(X1+X2+X3+X4+X5)/5] = [E(X1)+E(X2)+E(X3)+E(X4)+E(X5)]/5 = 5/5 = 1 V(Y) = V[(X1+X2+X3+X4+X5)/5] = V[(X1+X2+X3+X4+X5)] /25 = [V(X1)+V(X2)+V(X3)+V(X4)+V(X5)]/25 = 5/25=1/5.
9. (8 points) A marketing research company surveyed grocery shoppers in Hong Kong and Macau to see the percentage of the customers who prefer chicken to other meat. The data are given below. Number of customers in the Sample size sample that prefer chicken Hong Kong 492 156 Macau 386 172 a. (3 points) Is the proportion of customers who prefer chicken higher in Macau than in Hong Kong? Test at α =.05. H 0 : π Macau = π Hongkong H A : π Macau > π Hongkong It is a one sided test. We will reject the null if the standardized difference of the sample proportions (p Macau - p Hongkong ) is too large (larger than Z 0.05 ). p Macau = 172/368 = 0.4456 p Hongkong = 156/492 = 0.317 The relevant standard deviation under the null is to use the common proportion in the two cities: p Macau+Hongkong = (172+156)/(368+492) = 0.3736 S Macau+Hongkong = [0.3736*0.6264*(1/492 + 1/386)] 1/2 =0.0328 Hence the test statistic is t = (0.4456 0.317)/0.0328 = 3.92 > 1.645 = Z 0.05 Note it is a one sided test. Thus, we should use Z 0.05 instead of Z 0.025. b. (3 points) Determine the 99% confidence interval for the difference between the proportion of customers who prefer chicken in Macau and the proportion of customers who prefer chicken in Hong Kong. The relevant standard deviation is to use the individual proportions of the two cities: Sp Macau - p Hongkong = [0.317*0.683/491 + 0.4456*0.5544/385)] 1/2 =0.0329032 p Macau - p Hongkong =0.4456 0.317 =0.128 Confidence interval = [0.128 2.58*0.0329, 0.128 + 2.58*0.0329]= [0.128 0.08489, 0.128 + 0.08489]=[0.04311, 0.21289] c. (2 points) The 95% confidence interval for the difference between the proportion of customers who prefer chicken in Macau and the proportion of customers who prefer chicken in Hong Kong is 0.0642 to 0.193. Provide a one-sentence interpretation of this interval. We are 95% confident that the proportion of Hong Kong customers who prefer chicken to other meat is between 6.42% to 19.3% less than Macau customers who prefer chicken to other meat. 10. (6 points) A researcher has used a one-way analysis of variance model to test whether the average starting salaries differ among the recent graduates from nursing, engineering, business and education disciplines. She has randomly selected four graduates from each of the four areas. a. (3 points) If MSE = 4, and SSTO = 120 complete the following ANOVA table and determine the value of the F statistic.
Source SS DF MS F Treatment 72 3 24 6 Error 48 12 4 Total 120 15 b. (3 points) Is there a significant difference in the starting salaries among the four disciplines? H 0 : Starting salaries are the same among the four disciplines. H 1 : Starting salaries are different at least among a pair of disciplines. Note that the question does not specify any level of significance for the hypothesis test. We are free to use any reasonable level of significance, i.e., 0.01, 0.05, 0.10. For example, at 0.05 level of significance, (F = 6) > F.05,3,12 = 3.49, therefore we reject H 0. 11. (4 points) State and explain briefly the Chebyshev s Theorem. Chebyshev s Theorem states that given any population that has mean µ and standard deviation σ, for any value of k greater than 1, at least 100*(1-1/k 2 )% of the population measurements lie in the interval [µ ± kσ]. 12. (3 points) During off-peak hours, cars arrive at a tollbooth of the Western District Cross- Harbour Tunnel at an average rate of 0.5 cars per minute. What is the probability that during the next minute at least three cars will arrive? First, we identify that the number of cars arriving at the tollbooth can be described by a Poisson distribution. Second, we note that we can compute the probability by focusing at the complement: P(at least three cars) = P(X=3)+P(X=4)+P(X=5)+ = 1 P(X=0) P(X=1) P(X=2). Using the given formula, P(X=0) = exp(-0.5)*0.5 0 /0! = exp(-0.5) P(X=1) = exp(-0.5)*0.5 1 /1! = exp(-0.5)*0.5, P(X=2) = exp(-0.5)*0.5 2 /2! = exp(-0.5)* 0.5 2 /2. P(X=0) + P(X=1) + P(X=2) = exp(-0.5)[1+0.5+0.5 2 /2] = 0.6065*1.625=0.9856. Hence, P(at least three cars) = 1 0.9856 = 0.0144.
13. (10 points) Suppose that an airline quotes a flight time of 2 hours, 10 minutes between two cities. Furthermore, suppose that historical flight records indicate that the actual flight time between the two cities, x, is uniformly distributed between 2 hours (i.e., 120 minutes) and 2 hours, 20 minutes (i.e., 140 minutes). Letting the time unit be one minute, a. (1 points) Write the formula for the probability curve of x. The general formula for uniform distribution over an interval [a,b] is Prob(x<k)= (k-a)/(b-a) = (k-120)*0.05 for k in the interval. Prob(x<k)= 0 for k < a. Prob(x<k)= 1 for k > b. Or from the formula given, density = 1/(b-a) = 0.05 inside the interval, and 0 otherwise. b. (2 points) Graph the probability curve of x. density 1/(b-a) =0.05 0 a=120 b=140 c. (2 points) Find the probability that a randomly selected flight between the two cities will be at least five minutes late. Prob(x>130+5) = Prob(x>135) =1- Prob(x<135) = 1- (135-120)*0.05 = 1-0.75 = 0.25. d. (2 points) Calculate the mean flight time and the standard deviation of the flight time. From the formula given: mean = (a+b)/2 = (120+140)/2=130, standard deviation = (b-a)/12 1/2 = (140-120)/12 1/2 =20/3.464 = 5.7735. e. (3 points) Find the probability that the flight time will be within one standard deviation of the mean. Prob(130-5.7735<x<130+5.7735) = Prob(124.2265<x<135.7735) = Prob(x<124.2265) - Prob(x<135.7735) = 124.2265*0.05-135.7735*0.05 = 5.7735*2*0.05 = 0.57735 14. (22 points) Suppose that a regression is run using the number of hours of study time per
week (STUDY) as the dependent variable. There are 35 observations. The independent variables are: WKHRS: COMMUTER: MALE: SENIOR: The number of hours worked per week at a job, A dummy variable equal to one for commuting students, and 0 for resident students, A dummy variable equal to one if the student is male, and 0 if the student is female, A dummy variable equal to one if the student is a senior, and 0 otherwise. The results are as follows. Variable Coefficient Standard error CONSTANT 10.0 20.0 WKHRS - 0.5 0.125 COMMUTER 2.0 2.0 MALE - 3.0 6.0 SENIOR 6.0 2.0 Analysis of Variance: Source of variation Sum of squares Regression 160.0 Error 40.0 Total 200.0 Degrees of freedom Mean square a. (3 points) Complete the ANOVA table. Analysis of Variance: Source of variation Sum of squares Degrees of freedom Mean square Regression 160.0 k=4 40 Error 40.0 n-(k+1)=30 1.33 Total 200.0 n-1=34 b. (2 points) Compute the standard error of the estimate (or standard error of the regression). Standard error of estimate = the square root of {sum of squared (prediction) errors / [n-(k+1)] }= [40.0/30] 1/2 = 1.33 1/2 = 1.1547. c. (3 points) Compute and interpret the unadjusted coefficient of determination. Unadjusted coefficient of determination = Explained variation / Total variation = Regression sum of squares / Total sum of Squares = 160/200=0.8. d. (2 points) Compute the coefficient of determination adjusted for degrees of freedom. According to the formula given, Adjusted R 2 = [R 2 k/(n-1)] [ (n-1)/(n-(k+1))] = [0.8 4/(34)] [ (34)/(30)] = 0.7733. e. (2 points) Test at the 5% level whether the coefficient on the variable MALE is equal to zero.
H 0 : The coefficient on the variable MALE is equal to zero. H 1 : The coefficient on the variable MALE is not equal to zero. Note that it is a two sided test, i.e., we will reject the null if the estimated coefficient (and the corresponding test statistics) is either too small or too large. Sample estimate of the coefficient = -3.0. Sample estimate of the corresponding standard error = 6.0. The test statistic is t = (-3.0 0)/6.0 = -0.5. Because the t statistic is less than 1.96 in absolute terms, we cannot reject H 0. f. (2 points) Test at the 5% level the null hypothesis that the coefficient on the variable COMMUTER is equal to zero, versus the alternative that it is more than zero. H 0 : The coefficient on the variable COMMUTER is equal to zero. H 1 : The coefficient on the variable COMMUTER is more than zero. Note that it is a one sided test, i.e., we will reject the null if the estimated coefficient is too large. Sample estimate of the coefficient = 2.0. Sample estimate of the corresponding standard error = 2.0. The test statistic is t = (2.0 0)/2.0 = 1. Because the t statistic is less than 1.645 (also accept 1.64 or 1.65 as answer), we cannot reject H 0. g. (2 points) Test at the 5% level the hypothesis that all the slope coefficients are zero. H 0 : All the slope coefficients are zero. H 1 : At least one slope coefficient is not zero. We use the F-test. Under the null, the F-stat = mean regression sum of squares / mean error sum of squares distributes as F with nominator degree of freedom k = 4 and denominator degree of freedom n (k+1)=30. The sample F-stat = 40/1.33 = 30.075. Because this sample F-stat is larger than the 5% critical value of 2.69, we reject H 0 and favor the alternative. h. (2 points) What is the expected number of hours of study time per week for a senior male resident student who works 8 hours per week at a job? The description is translated into WKHRS=8, COMMUTER=0, MALE=1, SENIOR = 1. Hence, expected number of hours of study time per week is 10.0 0.5*8 + 2*0 3*1 + 6*1 = 9. i. (2 points) How much does expected study time change if a student works an additional hour at a job? Specify whether this change is an increase or a decrease in study time. The expected study time change if a student works an additional hour at a job can be read from the coefficient of WKHRS. That is, we expect a decrease of 0.5 hours in study time if a student works an additional hour at a job. j. (2 points) According to the regression results, do seniors study more or less than non-senior students? By how much? The answer can be read from the coefficient of SENIOR. That is, seniors generally study 6 hours more than non-senior students.