OXFORD CAMBRIDGE AND RSA EXAMINATIONS Advanced Subsidiary General Certificate of Education Advanced General Certificate of Education MEI STRUCTURED MATHEMATICS 4767 Statistics 2 Thursday 9 JUNE 2005 Morning 1 hour 30 minutes Additional materials: Answer booklet Graph paper MEI Examination Formulae and Tables (MF2) TIME 1 hour 30 minutes INSTRUCTIONS TO CANDIDATES Write your name, centre number and candidate number in the spaces provided on the answer booklet. Answer all the questions. You are permitted to use a graphical calculator in this paper. INFORMATION FOR CANDIDATES The number of marks is given in brackets [ ] at the end of each question or part question. You are advised that an answer may receive no marks unless you show sufficient detail of the working to indicate that a correct method is being used. Final answers should be given to a degree of accuracy appropriate to the context. The total number of marks for this paper is 72. This question paper consists of 5 printed pages and 3 blank pages. HN/3 OCR 2005 [L/102/2657] Registered Charity 1066969 [Turn over
2 1 A student is collecting data on traffic arriving at a motorway service station during weekday lunchtimes. The random variable X denotes the number of cars arriving in a randomly chosen period of ten seconds. (i) State two assumptions necessary if a Poisson distribution is to provide a suitable model for the distribution of X. Comment briefly on whether these assumptions are likely to be valid. [4] The student counts the number of arrivals, x, in each of 100 ten-second periods. The data are shown in the table below. x 0 1 2 3 4 5 >5 Frequency, f 18 39 20 12 8 3 0 (ii) Show that the sample mean is 1.62 and calculate the sample variance. [3] (iii) Do your calculations in part (ii) support the suggestion that a Poisson distribution is a suitable model for the distribution of X? Explain your answer. [1] For the remainder of this question you should assume that X may be modelled by a Poisson distribution with mean 1.62. (iv) Find P(X 2 ). Comment on your answer in relation to the data in the table. [4] (v) Find the probability that at least ten cars arrive in a period of 50 seconds during weekday lunchtimes. [3] (vi) Use a suitable approximating distribution to find the probability that no more than 550 cars arrive in a randomly chosen period of one hour during weekday lunchtimes. [4]
3 2 The fuel economy of a car varies from day to day according to weather and driving conditions. Fuel economy is measured in miles per gallon (mpg). The fuel economy of a particular petrol-fuelled type of car is known to be Normally distributed with mean 38.5 mpg and standard deviation 4.0 mpg. (i) Find the probability that on a randomly selected day the fuel economy of a car of this type will be above 45.0 mpg. [4] (ii) The manufacturer wishes to quote a fuel economy figure which will be exceeded on 90% of days. What figure should be quoted? [3] The daily fuel economy of a similar type of car which is diesel-fuelled is known to be Normally distributed with mean 51.2 mpg and unknown standard deviation mpg. (iii) Given that on 75% of days the fuel economy of this type of car is below 55.0 mpg, show that [3] s 5.63. (iv) Draw a sketch to illustrate both distributions on a single diagram. [4] (v) Find the probability that the fuel economy of either the petrol or the diesel model (or both) will be above 45.0 mpg on a randomly selected day. You may assume that the fuel economies of the two models are independent. [4] s [Turn over
4 3 In a triathlon, competitors have to swim 600 metres, cycle 40 kilometres and run 10 kilometres. To improve her strength, a triathlete undertakes a training programme in which she carries weights in a rucksack whilst running. She runs a specific course and notes the total time taken for each run. Her coach is investigating the relationship between time taken and weight carried. The times taken with eight different weights are illustrated on the scatter diagram below, together with the summary statistics for these data. The variables x and y represent weight carried in kilograms and time taken in minutes respectively. y 29.0 28.0 27.0 26.0 25.0 0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 x Summary statistics: n 8, x 36, y 214.8, x 2 204, y 2 5775.28, xy 983.6. (i) Calculate the equation of the regression line of y on x. [5] On one of the eight runs, the triathlete was carrying 4 kilograms and took 27.5 minutes. On this run she was delayed when she tripped and fell over. (ii) Calculate the value of the residual for this weight. [3] (iii) The coach decides to recalculate the equation of the regression line without the data for this run. Would it be preferable to use this recalculated equation or the equation found in part (i) to estimate the delay when the triathlete tripped and fell over? Explain your answer. [2] The triathlete s coach claims that there is positive correlation between cycling and swimming times in triathlons. The product moment correlation coefficient of the times of twenty randomly selected competitors in these two sections is 0.209. (iv) Carry out a hypothesis test at the 5% level to examine the coach s claim, explaining your conclusions clearly. [5] (v) What distributional assumption is necessary for this test to be valid? How can you use a scatter diagram to decide whether this assumption is likely to be true? [2]
5 4 (a) The selling prices of semi-detached houses in the suburbs of a particular city are known to be Normally distributed with mean 166 500 and standard deviation 14 200. A householder on one large estate claims that houses on her estate have a higher mean selling price. The selling prices of six randomly selected houses on her estate are 180 000, 152 000, 156 500, 172 000, 189 000, 169 000. (i) State suitable null and alternative hypotheses to test her claim. [2] (ii) Carry out the test at the 5% level of significance, stating your conclusions clearly. You may assume that the standard deviation of the selling prices of houses on this estate is 14 200. [6] (b) The manager of a restaurant undertakes a survey of the numbers and types of drinks ordered by a random sample of 400 customers. Customers are categorized as business, tourist or local. The drinks are categorized as alcoholic or soft drinks. A table of results of the survey is as follows. Type of drink Row Alcoholic Soft drinks totals Type Business 54 63 117 of Tourist 95 41 136 customer Local 71 76 147 Column totals 220 180 400 Carry out a test at the 5% level of significance to examine whether there is any association between type of customer and type of drink. State carefully your null and alternative hypotheses. [10]
6 BLANK PAGE
7 BLANK PAGE
8 BLANK PAGE Permission to reproduce items where third-party owned material protected by copyright is included has been sought and cleared where possible. Every reasonable effort has been made by the publisher (OCR) to trace copyright holders, but if any items requiring clearance have unwittingly been included, the publisher will be pleased to make amends at the earliest possible opportunity. OCR is part of the Cambridge Assessment Group. Cambridge Assessment is the brand name of University of Cambridge Local Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge.