MATH c UNIVERSITY OF LEEDS Examination for the Module MATH1725 (May-June 2009) INTRODUCTION TO STATISTICS. Time allowed: 2 hours

01 This question paper consists of 11 printed pages, each of which is identified by the reference. Only approved basic scientific calculators may be used. Statistical tables are provided at the end of the exam paper. c UNIVERSITY OF LEEDS Examination for the Module (May-June 2009) INTRODUCTION TO STATISTICS Time allowed: 2 hours Attempt ALL questions in Section A and TWO questions from Section B. Questions A1 to A10 require you to write down a single letter answer. Questions A11 to A20 require you to write down a short explanation. Your answers to Section A questions and Section B questions may be written in the same answer book. Sections A and B are each worth 50% of the examination marks. Questions A1 to A20 carry equal weight. 1 CONTINUED...

SECTION A Attempt ALL questions in Section A. Questions A1 to A10 require you to write down a single letter answer. A1. The incomes (in units of 1000) of ten directors of Marks and Spencer plc during 2008 were: 453, 1375, 698, 293, 701, 73, 57, 68, 79, 73. What does the sample median equal? A 57, B 137.5, C 186, D 293, E 387. A2. If Z N(0, 1), what is the value of P(Z > 1.15). A 0.1251, B 0.4404, C 0.8500, D 0.8749, E 1.0000. A3. Suppose variables X 1, X 2,..., X n have common mean µ and variance σ 2. Their mean X is said to be an unbiased estimator of µ. What does this tell you? A X = 1 n n i=1 X i, B E[ X] = µ, C Var[ X] = σ2, D µ is a special measure of spread. n A4. A sample correlation coefficient r xy equals 1. What does this definitely tell you about the corresponding scatter plot of y against x? A Data points closely scattered about a straight line, B data points all lie on a straight line with slope 1, C data points all lie on a straight line with zero slope, D data points all lie on a straight line with negative slope. A5. A least squares regression problem has n pairs of data (x i, y i ), i = 1, 2,..., n. The fitted least squares regression line is y = ˆα + ˆβx. Which quantity is minimised to derive ˆα and ˆβ? A n y i α βx i, B i=1 n (y i +α βx i ) 2, C i=1 n (y i α βx i ) 2, D i=1 n (y i +α+βx i ) 2. i=1 2 CONTINUED...

A6. The boxplot below shows the heights in metres for 25 male students and 25 female students. Females Males 1.55 1.60 1.65 1.70 1.75 1.80 1.85 Height (m) Which of the following statements are true? (i) The median height of males is less than the median height of females. (ii) The semi-interquartile range of female heights is about 0.115m. (iii) The variability of male and female heights is about the same. A: (ii) only, B: (i) and (ii), C: (ii) and (iii), D: (iii) only. A7. Random variables X and Y have correlation coefficient 0.5. If X has mean 2 and variance 4, and Y has mean 1 and variance 1, what is the mean of X 2Y? A 2, B 1, C 0, D 1, E 2. A8. In question A7 above, what is the variance of X 2Y? A 2, B 0, C 2, D 4, E 6, F 7. A9. If random variables X and Y each have variance equal 3 and X + Y has variance 8, what does the covariance between X and Y equal? A 0, B 1, C 2, D 3, E not enough information to say. A10. For the χ 2 -distribution with 5 degrees of freedom, what is the value of χ 2 5 (10%)? A 9.236, B 11.07, C 15.09, D 15.99, E 18.31. 3 CONTINUED...

Questions A11 to A20 require you to write down a short explanation. A11. For a set of n observations, what is a dot-plot? A12. Briefly describe the central limit theorem. A13. A sample of n = 16 values has sample mean x = 1.44 and sample variance s 2 = 1.44. Is the sample mean significantly different from zero? A14. Values x i and y i, i = 1, 2,..., n, lie on a horizontal line y = c where c is a constant. What does the sample covariance s XY equal? A15. Random variables X and Y are both discrete with joint probability function p(x i, y j ). How would you calculate the marginal probability function of X, p X (x i )? A16. In question A15 above, how would you calculate E[XY ]? A17. A random sample of size n is taken from a population of size N with replacement. If the population consists of R individuals of type A and N R of type B, what is the probability that the sample contains r of type A? A18. In question A17 above, if n is large and R/N is close to 1, what continuous distribution 2 could be used as an approximation when calculating the required probability? (State also the mean and variance of this distribution.) A19. In a sample of 161 first year Leeds University students, 21 did more than 5 hours of paid work in a given week of term. Use these data to obtain an approximate 95% confidence interval for the proportion of Leeds University students who do more than 5 hours of paid work in a week of term. A20. In a chi-squared test with ten groups the observed value of the chi-squared test statistic under some null hypothesis H 0 is χ 2 obs. What would extremely small values of χ 2 obs suggest about the experimental data? 4 CONTINUED...

SECTION B Attempt TWO questions from Section B B1. The following data give the heights of 100 male students at a certain university. Height Number of (in inches) students 63 65 2 66 68 11 69 71 33 72 74 43 75 77 11 (a) Calculate the sample mean and variance for these data. (b) A suitable normal distribution is fitted to these data and some expected frequencies have been determined as shown in the table below. Height Observed Expected (in inches) Frequency Frequency 63 65 2 1.3 66 68 11 12.1 69 71 33 72 74 43 75 77 11 Determine the expected frequencies for the remaining class intervals. (c) Test whether your fitted normal distribution gives a good fit to these data. 5 CONTINUED...

B2. (a) Pairs of measurements (x i, y i ), i = 1, 2,..., n, are made on each of n individuals. The least squares regression line for y given x is y = α + βx. Derive the least squares estimates of α and β. (b) The anxiety level of subjects in a certain stress situation was assessed using two different procedures. (I) The stait-trait anxiety inventory (STAI) consisting of twenty questions. (II) The linear analogue (LA) score in which the subject is asked to indicate on a 100mm scale their perceived anxiety level with 0mm on the scale corresponding to the statement I do not feel anxious at all and 100mm on the scale corresponding to the statement I could not feel more anxious. For ten subjects the STAI and LA scores are given below. yi 2 = 16323, i Subject i STAI score y i LA score x i 1 20 10 2 25 0 3 29 37 4 33 28 5 36 8 6 42 47 7 45 38 8 49 39 9 49 94 10 59 78 x 2 i = 22411, i x i y i = 17288. i (i) Fit a least squares regression line for predicting the STAI score given an LA score. (ii) Use your fitted regression line to predict the STAI score for an LA score x = 20. (c) Define the residuals r i for your fitted model and show that the residual for subject 5 equals 7.03. 6 CONTINUED...

B3. (a) A study of blood alcohol levels (in mg/litre) at post mortem examinations of road accident victims involved taking one blood sample from the leg (column A) and another from the heart (column B). The results are tabulated below. Case A B 1 153 161 2 92 93 3 186 186 4 242 244 5 55 58 6 80 82 7 126 124 8 161 167 9 302 321 10 145 149 11 39 51 12 76 81 Do these results indicate that there is a significant difference in blood alcohol levels for the same individual in the leg compared with the heart? Why is it reasonable to suppose these data can be regarded as matched-pairs? (b) The following data give the length (in mm) of cuckoo (cuculus canorus) eggs found in nests belonging to wrens (A) and reed warblers (B). A: 19.8 22.1 21.5 20.9 22.0 21.0 22.3 21.0 20.3 20.9 B: 23.2 22.0 22.2 21.2 21.6 21.6 21.9 22.0 22.9 22.8 Is there any evidence at the 1% level to suggest that the egg size differs between the two host species? Why is it unreasonable to suppose these data can be regarded as matched-pairs? (c) What do you understand by the phrase matched-pairs? 7 CONTINUED...

B4. (a) The random variables X and Y have means µ X and µ Y respectively, variances σ 2 X and σ 2 Y respectively, and the correlation coefficient between them is ρ. Write down the mean and variance of ax + by, where a and b are constants. (b) An unbiased six-sided die is rolled n times. Let X 1 denote the total number of 1 s observed in the n rolls, and X 2 denote the total number of 2 s observed in the n rolls. Both X 1 and X 2 have binomial distributions. Explain briefly why this is so and state the parameters of the binomial distributions. (c) What are the variances of X 1 and X 2? (d) The random variable U = X 1 + X 2 gives the total number of 1 s and 2 s observed in the n rolls. By considering the distribution of the random variable U and hence obtaining its variance, or otherwise, deduce that the correlation coefficient between X 1 and X 2 is ρ = 1 5. (e) Obtain the variance of the difference V = X 1 X 2. (f) Determine the correlation coefficient between U and V. (g) Describe briefly how you could verify whether U and V are independent. (Explicit calculation is not required.) 8 CONTINUED...

Normal Distribution Function Tables The first table gives Φ(x) = 1 2π x e 1 2 t2 dt and this corresponds to the shaded area in the figure to the right. Φ(x) is the probability that a random variable, normally distributed with zero mean amd unit variance, will be less than or equal to x. When x < 0 use Φ(x) = 1 Φ( x), as the normal distribution with mean zero is symmetric about zero. To interpolate, use the formula Φ(x) Φ(x 1 ) + x x 1 x 2 x 1 (Φ(x 2 ) Φ(x 1 )) 0.0 0.1 0.2 0.3 0.4 x 3 2 1 0 1 2 3 Table 1 x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x) 0.00 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946 0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953 0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960 0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965 0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970 0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974 0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978 0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981 0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987 The inverse function Φ 1 (p) is tabulated below for various values of p. Table 2 p 0.900 0.950 0.975 0.990 0.995 0.999 0.9995 Φ 1 (p) 1.2816 1.6449 1.9600 2.3263 2.5758 3.0902 3.2905 9 CONTINUED...

Percentage Points of the t-distribution This table gives the percentage points t ν (P) for various values of P and degrees of freedom ν, as indicated by the figure to the right. The lower percentage points are given by symmetry as t ν (P), and the probability that t t ν (P) is 2P/100. The limiting distribution of t as ν is the normal distribution with zero mean and unit variance. 0 t ν (P) P/100 Percentage points P ν 10 5 2.5 1 0.5 0.1 0.05 1 3.078 6.314 12.706 31.821 63.657 318.309 636.619 2 1.886 2.920 4.303 6.965 9.925 22.327 31.599 3 1.638 2.353 3.182 4.541 5.841 10.215 12.924 4 1.533 2.132 2.776 3.747 4.604 7.173 8.610 5 1.476 2.015 2.571 3.365 4.032 5.893 6.869 6 1.440 1.943 2.447 3.143 3.707 5.208 5.959 7 1.415 1.895 2.365 2.998 3.499 4.785 5.408 8 1.397 1.860 2.306 2.896 3.355 4.501 5.041 9 1.383 1.833 2.262 2.821 3.250 4.297 4.781 10 1.372 1.812 2.228 2.764 3.169 4.144 4.587 11 1.363 1.796 2.201 2.718 3.106 4.025 4.437 12 1.356 1.782 2.179 2.681 3.055 3.930 4.318 13 1.350 1.771 2.160 2.650 3.012 3.852 4.221 14 1.345 1.761 2.145 2.624 2.977 3.787 4.140 15 1.341 1.753 2.131 2.602 2.947 3.733 4.073 16 1.337 1.746 2.120 2.583 2.921 3.686 4.015 18 1.330 1.734 2.101 2.552 2.878 3.610 3.922 21 1.323 1.721 2.080 2.518 2.831 3.527 3.819 25 1.316 1.708 2.060 2.485 2.787 3.450 3.725 30 1.310 1.697 2.042 2.457 2.750 3.385 3.646 40 1.303 1.684 2.021 2.423 2.704 3.307 3.551 50 1.299 1.676 2.009 2.403 2.678 3.261 3.496 70 1.294 1.667 1.994 2.381 2.648 3.211 3.435 100 1.290 1.660 1.984 2.364 2.626 3.174 3.390 1.282 1.645 1.960 2.326 2.576 3.090 3.291 10 CONTINUED...

Percentage Points of the χ 2 -Distribution This table gives the percentage points χ 2 ν (P) for various values of P and degrees of freedom ν, as indicated by the figure to the right, plotted in the case ν = 3. If X is a variable distributed as χ 2 with ν degrees of freedom, P/100 is the probability that X χ 2 ν (P). For ν > 100, 2X is approximately normally distributed with mean 2ν 1 and unit variance. 0 χ 2 ν(p) P/100 Percentage points P ν 10 5 2.5 1 0.5 0.1 0.05 1 2.706 3.841 5.024 6.635 7.879 10.828 12.116 2 4.605 5.991 7.378 9.210 10.597 13.816 15.202 3 6.251 7.815 9.348 11.345 12.838 16.266 17.730 4 7.779 9.488 11.143 13.277 14.860 18.467 19.997 5 9.236 11.070 12.833 15.086 16.750 20.515 22.105 6 10.645 12.592 14.449 16.812 18.548 22.458 24.103 7 12.017 14.067 16.013 18.475 20.278 24.322 26.018 8 13.362 15.507 17.535 20.090 21.955 26.124 27.868 9 14.684 16.919 19.023 21.666 23.589 27.877 29.666 10 15.987 18.307 20.483 23.209 25.188 29.588 31.420 11 17.275 19.675 21.920 24.725 26.757 31.264 33.137 12 18.549 21.026 23.337 26.217 28.300 32.909 34.821 13 19.812 22.362 24.736 27.688 29.819 34.528 36.478 14 21.064 23.685 26.119 29.141 31.319 36.123 38.109 15 22.307 24.996 27.488 30.578 32.801 37.697 39.719 16 23.542 26.296 28.845 32.000 34.267 39.252 41.308 17 24.769 27.587 30.191 33.409 35.718 40.790 42.879 18 25.989 28.869 31.526 34.805 37.156 42.312 44.434 19 27.204 30.144 32.852 36.191 38.582 43.820 45.973 20 28.412 31.410 34.170 37.566 39.997 45.315 47.498 25 34.382 37.652 40.646 44.314 46.928 52.620 54.947 30 40.256 43.773 46.979 50.892 53.672 59.703 62.162 40 51.805 55.758 59.342 63.691 66.766 73.402 76.095 50 63.167 67.505 71.420 76.154 79.490 86.661 89.561 80 96.578 101.879 106.629 112.329 116.321 124.839 128.261 11 END