MATH c UNIVERSITY OF LEEDS Examination for the Module MATH1725 (May-June 2009) INTRODUCTION TO STATISTICS. Time allowed: 2 hours

Similar documents
Scatter plot of data from the study. Linear Regression

Final Exam - Solutions

Scatter plot of data from the study. Linear Regression

MATH1725 Introduction to Statistics: Worked examples

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Correlation and Linear Regression

UNIVERSITY OF TORONTO Faculty of Arts and Science

Part III: Unstructured Data

Midterm 2 - Solutions

Chapter 12 - Lecture 2 Inferences about regression coefficient

STAT 7030: Categorical Data Analysis

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Statistics 100 Exam 2 March 8, 2017

Lecture 14 Simple Linear Regression

End of year revision

Topic 16 Interval Estimation

Problem Selected Scores

Linear Correlation and Regression Analysis

ISQS 5349 Final Exam, Spring 2017.

Linear Regression Model. Badr Missaoui

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Summer Review for Mathematical Studies Rising 12 th graders

Correlation: Relationships between Variables

Inference for Regression

Analytical Methods. Session 3: Statistics II. UCL Department of Civil, Environmental & Geomatic Engineering. Analytical Methods.

UNIT 12 ~ More About Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

ST430 Exam 1 with Answers

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

9 Correlation and Regression

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Chapter 10. Simple Linear Regression and Correlation

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Vocabulary: Samples and Populations

Simple Linear Regression

STAT FINAL EXAM

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Simple Linear Regression Analysis

[y i α βx i ] 2 (2) Q = i=1

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Measuring the fit of the model - SSR

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should

Simple Linear Regression

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

Applied Econometrics (QEM)

Lecture 11: Simple Linear Regression

Inference for the Regression Coefficient

Exam Applied Statistical Regression. Good Luck!

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

13 Simple Linear Regression

Simple Linear Regression

Math Review Sheet, Fall 2008

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Simple Linear Regression

WISE International Masters

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Mathematical Notation Math Introduction to Applied Statistics

MATH11400 Statistics Homepage

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

You are permitted to use your own calculator where it has been stamped as approved by the University.

This document contains 3 sets of practice problems.

2.1 Linear regression with matrices

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Can you tell the relationship between students SAT scores and their college grades?

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

DSST Principles of Statistics

STAT 350: Summer Semester Midterm 1: Solutions

STAT 285: Fall Semester Final Examination Solutions

Analysing data: regression and correlation S6 and S7

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

MATH 1150 Chapter 2 Notation and Terminology

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

REVIEW 8/2/2017 陈芳华东师大英语系

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

11 Correlation and Regression

1 A Review of Correlation and Regression

appstats27.notebook April 06, 2017

Reminder: Student Instructional Rating Surveys

AP Final Review II Exploring Data (20% 30%)

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Ordinary Least Squares Regression Explained: Vartanian

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015

Chapter 27 Summary Inferences for Regression

Introduction to Linear Regression

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

11.5 Regression Linear Relationships

Stat 5102 Final Exam May 14, 2015

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Transcription:

01 This question paper consists of 11 printed pages, each of which is identified by the reference. Only approved basic scientific calculators may be used. Statistical tables are provided at the end of the exam paper. c UNIVERSITY OF LEEDS Examination for the Module (May-June 2009) INTRODUCTION TO STATISTICS Time allowed: 2 hours Attempt ALL questions in Section A and TWO questions from Section B. Questions A1 to A10 require you to write down a single letter answer. Questions A11 to A20 require you to write down a short explanation. Your answers to Section A questions and Section B questions may be written in the same answer book. Sections A and B are each worth 50% of the examination marks. Questions A1 to A20 carry equal weight. 1 CONTINUED...

SECTION A Attempt ALL questions in Section A. Questions A1 to A10 require you to write down a single letter answer. A1. The incomes (in units of 1000) of ten directors of Marks and Spencer plc during 2008 were: 453, 1375, 698, 293, 701, 73, 57, 68, 79, 73. What does the sample median equal? A 57, B 137.5, C 186, D 293, E 387. A2. If Z N(0, 1), what is the value of P(Z > 1.15). A 0.1251, B 0.4404, C 0.8500, D 0.8749, E 1.0000. A3. Suppose variables X 1, X 2,..., X n have common mean µ and variance σ 2. Their mean X is said to be an unbiased estimator of µ. What does this tell you? A X = 1 n n i=1 X i, B E[ X] = µ, C Var[ X] = σ2, D µ is a special measure of spread. n A4. A sample correlation coefficient r xy equals 1. What does this definitely tell you about the corresponding scatter plot of y against x? A Data points closely scattered about a straight line, B data points all lie on a straight line with slope 1, C data points all lie on a straight line with zero slope, D data points all lie on a straight line with negative slope. A5. A least squares regression problem has n pairs of data (x i, y i ), i = 1, 2,..., n. The fitted least squares regression line is y = ˆα + ˆβx. Which quantity is minimised to derive ˆα and ˆβ? A n y i α βx i, B i=1 n (y i +α βx i ) 2, C i=1 n (y i α βx i ) 2, D i=1 n (y i +α+βx i ) 2. i=1 2 CONTINUED...

A6. The boxplot below shows the heights in metres for 25 male students and 25 female students. Females Males 1.55 1.60 1.65 1.70 1.75 1.80 1.85 Height (m) Which of the following statements are true? (i) The median height of males is less than the median height of females. (ii) The semi-interquartile range of female heights is about 0.115m. (iii) The variability of male and female heights is about the same. A: (ii) only, B: (i) and (ii), C: (ii) and (iii), D: (iii) only. A7. Random variables X and Y have correlation coefficient 0.5. If X has mean 2 and variance 4, and Y has mean 1 and variance 1, what is the mean of X 2Y? A 2, B 1, C 0, D 1, E 2. A8. In question A7 above, what is the variance of X 2Y? A 2, B 0, C 2, D 4, E 6, F 7. A9. If random variables X and Y each have variance equal 3 and X + Y has variance 8, what does the covariance between X and Y equal? A 0, B 1, C 2, D 3, E not enough information to say. A10. For the χ 2 -distribution with 5 degrees of freedom, what is the value of χ 2 5 (10%)? A 9.236, B 11.07, C 15.09, D 15.99, E 18.31. 3 CONTINUED...

Questions A11 to A20 require you to write down a short explanation. A11. For a set of n observations, what is a dot-plot? A12. Briefly describe the central limit theorem. A13. A sample of n = 16 values has sample mean x = 1.44 and sample variance s 2 = 1.44. Is the sample mean significantly different from zero? A14. Values x i and y i, i = 1, 2,..., n, lie on a horizontal line y = c where c is a constant. What does the sample covariance s XY equal? A15. Random variables X and Y are both discrete with joint probability function p(x i, y j ). How would you calculate the marginal probability function of X, p X (x i )? A16. In question A15 above, how would you calculate E[XY ]? A17. A random sample of size n is taken from a population of size N with replacement. If the population consists of R individuals of type A and N R of type B, what is the probability that the sample contains r of type A? A18. In question A17 above, if n is large and R/N is close to 1, what continuous distribution 2 could be used as an approximation when calculating the required probability? (State also the mean and variance of this distribution.) A19. In a sample of 161 first year Leeds University students, 21 did more than 5 hours of paid work in a given week of term. Use these data to obtain an approximate 95% confidence interval for the proportion of Leeds University students who do more than 5 hours of paid work in a week of term. A20. In a chi-squared test with ten groups the observed value of the chi-squared test statistic under some null hypothesis H 0 is χ 2 obs. What would extremely small values of χ 2 obs suggest about the experimental data? 4 CONTINUED...

SECTION B Attempt TWO questions from Section B B1. The following data give the heights of 100 male students at a certain university. Height Number of (in inches) students 63 65 2 66 68 11 69 71 33 72 74 43 75 77 11 (a) Calculate the sample mean and variance for these data. (b) A suitable normal distribution is fitted to these data and some expected frequencies have been determined as shown in the table below. Height Observed Expected (in inches) Frequency Frequency 63 65 2 1.3 66 68 11 12.1 69 71 33 72 74 43 75 77 11 Determine the expected frequencies for the remaining class intervals. (c) Test whether your fitted normal distribution gives a good fit to these data. 5 CONTINUED...

B2. (a) Pairs of measurements (x i, y i ), i = 1, 2,..., n, are made on each of n individuals. The least squares regression line for y given x is y = α + βx. Derive the least squares estimates of α and β. (b) The anxiety level of subjects in a certain stress situation was assessed using two different procedures. (I) The stait-trait anxiety inventory (STAI) consisting of twenty questions. (II) The linear analogue (LA) score in which the subject is asked to indicate on a 100mm scale their perceived anxiety level with 0mm on the scale corresponding to the statement I do not feel anxious at all and 100mm on the scale corresponding to the statement I could not feel more anxious. For ten subjects the STAI and LA scores are given below. yi 2 = 16323, i Subject i STAI score y i LA score x i 1 20 10 2 25 0 3 29 37 4 33 28 5 36 8 6 42 47 7 45 38 8 49 39 9 49 94 10 59 78 x 2 i = 22411, i x i y i = 17288. i (i) Fit a least squares regression line for predicting the STAI score given an LA score. (ii) Use your fitted regression line to predict the STAI score for an LA score x = 20. (c) Define the residuals r i for your fitted model and show that the residual for subject 5 equals 7.03. 6 CONTINUED...

B3. (a) A study of blood alcohol levels (in mg/litre) at post mortem examinations of road accident victims involved taking one blood sample from the leg (column A) and another from the heart (column B). The results are tabulated below. Case A B 1 153 161 2 92 93 3 186 186 4 242 244 5 55 58 6 80 82 7 126 124 8 161 167 9 302 321 10 145 149 11 39 51 12 76 81 Do these results indicate that there is a significant difference in blood alcohol levels for the same individual in the leg compared with the heart? Why is it reasonable to suppose these data can be regarded as matched-pairs? (b) The following data give the length (in mm) of cuckoo (cuculus canorus) eggs found in nests belonging to wrens (A) and reed warblers (B). A: 19.8 22.1 21.5 20.9 22.0 21.0 22.3 21.0 20.3 20.9 B: 23.2 22.0 22.2 21.2 21.6 21.6 21.9 22.0 22.9 22.8 Is there any evidence at the 1% level to suggest that the egg size differs between the two host species? Why is it unreasonable to suppose these data can be regarded as matched-pairs? (c) What do you understand by the phrase matched-pairs? 7 CONTINUED...

B4. (a) The random variables X and Y have means µ X and µ Y respectively, variances σ 2 X and σ 2 Y respectively, and the correlation coefficient between them is ρ. Write down the mean and variance of ax + by, where a and b are constants. (b) An unbiased six-sided die is rolled n times. Let X 1 denote the total number of 1 s observed in the n rolls, and X 2 denote the total number of 2 s observed in the n rolls. Both X 1 and X 2 have binomial distributions. Explain briefly why this is so and state the parameters of the binomial distributions. (c) What are the variances of X 1 and X 2? (d) The random variable U = X 1 + X 2 gives the total number of 1 s and 2 s observed in the n rolls. By considering the distribution of the random variable U and hence obtaining its variance, or otherwise, deduce that the correlation coefficient between X 1 and X 2 is ρ = 1 5. (e) Obtain the variance of the difference V = X 1 X 2. (f) Determine the correlation coefficient between U and V. (g) Describe briefly how you could verify whether U and V are independent. (Explicit calculation is not required.) 8 CONTINUED...

Normal Distribution Function Tables The first table gives Φ(x) = 1 2π x e 1 2 t2 dt and this corresponds to the shaded area in the figure to the right. Φ(x) is the probability that a random variable, normally distributed with zero mean amd unit variance, will be less than or equal to x. When x < 0 use Φ(x) = 1 Φ( x), as the normal distribution with mean zero is symmetric about zero. To interpolate, use the formula Φ(x) Φ(x 1 ) + x x 1 x 2 x 1 (Φ(x 2 ) Φ(x 1 )) 0.0 0.1 0.2 0.3 0.4 x 3 2 1 0 1 2 3 Table 1 x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x) 0.00 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946 0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953 0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960 0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965 0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970 0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974 0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978 0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981 0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987 The inverse function Φ 1 (p) is tabulated below for various values of p. Table 2 p 0.900 0.950 0.975 0.990 0.995 0.999 0.9995 Φ 1 (p) 1.2816 1.6449 1.9600 2.3263 2.5758 3.0902 3.2905 9 CONTINUED...

Percentage Points of the t-distribution This table gives the percentage points t ν (P) for various values of P and degrees of freedom ν, as indicated by the figure to the right. The lower percentage points are given by symmetry as t ν (P), and the probability that t t ν (P) is 2P/100. The limiting distribution of t as ν is the normal distribution with zero mean and unit variance. 0 t ν (P) P/100 Percentage points P ν 10 5 2.5 1 0.5 0.1 0.05 1 3.078 6.314 12.706 31.821 63.657 318.309 636.619 2 1.886 2.920 4.303 6.965 9.925 22.327 31.599 3 1.638 2.353 3.182 4.541 5.841 10.215 12.924 4 1.533 2.132 2.776 3.747 4.604 7.173 8.610 5 1.476 2.015 2.571 3.365 4.032 5.893 6.869 6 1.440 1.943 2.447 3.143 3.707 5.208 5.959 7 1.415 1.895 2.365 2.998 3.499 4.785 5.408 8 1.397 1.860 2.306 2.896 3.355 4.501 5.041 9 1.383 1.833 2.262 2.821 3.250 4.297 4.781 10 1.372 1.812 2.228 2.764 3.169 4.144 4.587 11 1.363 1.796 2.201 2.718 3.106 4.025 4.437 12 1.356 1.782 2.179 2.681 3.055 3.930 4.318 13 1.350 1.771 2.160 2.650 3.012 3.852 4.221 14 1.345 1.761 2.145 2.624 2.977 3.787 4.140 15 1.341 1.753 2.131 2.602 2.947 3.733 4.073 16 1.337 1.746 2.120 2.583 2.921 3.686 4.015 18 1.330 1.734 2.101 2.552 2.878 3.610 3.922 21 1.323 1.721 2.080 2.518 2.831 3.527 3.819 25 1.316 1.708 2.060 2.485 2.787 3.450 3.725 30 1.310 1.697 2.042 2.457 2.750 3.385 3.646 40 1.303 1.684 2.021 2.423 2.704 3.307 3.551 50 1.299 1.676 2.009 2.403 2.678 3.261 3.496 70 1.294 1.667 1.994 2.381 2.648 3.211 3.435 100 1.290 1.660 1.984 2.364 2.626 3.174 3.390 1.282 1.645 1.960 2.326 2.576 3.090 3.291 10 CONTINUED...

Percentage Points of the χ 2 -Distribution This table gives the percentage points χ 2 ν (P) for various values of P and degrees of freedom ν, as indicated by the figure to the right, plotted in the case ν = 3. If X is a variable distributed as χ 2 with ν degrees of freedom, P/100 is the probability that X χ 2 ν (P). For ν > 100, 2X is approximately normally distributed with mean 2ν 1 and unit variance. 0 χ 2 ν(p) P/100 Percentage points P ν 10 5 2.5 1 0.5 0.1 0.05 1 2.706 3.841 5.024 6.635 7.879 10.828 12.116 2 4.605 5.991 7.378 9.210 10.597 13.816 15.202 3 6.251 7.815 9.348 11.345 12.838 16.266 17.730 4 7.779 9.488 11.143 13.277 14.860 18.467 19.997 5 9.236 11.070 12.833 15.086 16.750 20.515 22.105 6 10.645 12.592 14.449 16.812 18.548 22.458 24.103 7 12.017 14.067 16.013 18.475 20.278 24.322 26.018 8 13.362 15.507 17.535 20.090 21.955 26.124 27.868 9 14.684 16.919 19.023 21.666 23.589 27.877 29.666 10 15.987 18.307 20.483 23.209 25.188 29.588 31.420 11 17.275 19.675 21.920 24.725 26.757 31.264 33.137 12 18.549 21.026 23.337 26.217 28.300 32.909 34.821 13 19.812 22.362 24.736 27.688 29.819 34.528 36.478 14 21.064 23.685 26.119 29.141 31.319 36.123 38.109 15 22.307 24.996 27.488 30.578 32.801 37.697 39.719 16 23.542 26.296 28.845 32.000 34.267 39.252 41.308 17 24.769 27.587 30.191 33.409 35.718 40.790 42.879 18 25.989 28.869 31.526 34.805 37.156 42.312 44.434 19 27.204 30.144 32.852 36.191 38.582 43.820 45.973 20 28.412 31.410 34.170 37.566 39.997 45.315 47.498 25 34.382 37.652 40.646 44.314 46.928 52.620 54.947 30 40.256 43.773 46.979 50.892 53.672 59.703 62.162 40 51.805 55.758 59.342 63.691 66.766 73.402 76.095 50 63.167 67.505 71.420 76.154 79.490 86.661 89.561 80 96.578 101.879 106.629 112.329 116.321 124.839 128.261 11 END